{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "fdb0de8f",
   "metadata": {},
   "source": [
    "# DAY10\n",
    "\n",
    "1. 把之前所有的处理手段都处理一遍，回顾一下全流程，以后就用处理好的部分直接完成\n",
    "2. 开始机器学习建模（简单建模，不涉及调参）和评估\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5b3779e4",
   "metadata": {},
   "source": [
    "# 一、数据预处理\n",
    "## 1.1 导入所需要的包\n",
    "\n",
    "这里其实是写完后一起整理到这里的"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 255,
   "id": "ffc25c38",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import pandas as pd    #用于数据处理和分析，可处理表格数据。\n",
    "import numpy as np     #用于数值计算，提供了高效的数组操作。\n",
    "import matplotlib.pyplot as plt    #用于绘制各种类型的图表\n",
    "import seaborn as sns   #基于matplotlib的高级绘图库，能绘制更美观的统计图形。\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    " \n",
    " # 设置中文字体（解决中文显示问题）\n",
    "plt.rcParams['font.sans-serif'] = ['SimHei']  # Windows系统常用黑体字体\n",
    "plt.rcParams['axes.unicode_minus'] = False    # 正常显示负号"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "88ab8ba8",
   "metadata": {},
   "source": [
    "## 1.2 查看数据信息"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 256,
   "id": "af7e8e8f",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = pd.read_csv('E:\\study\\PythonStudy\\python60-days-challenge-master\\data.csv')    #读取数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 257,
   "id": "b1686f6f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Id</th>\n",
       "      <th>Home Ownership</th>\n",
       "      <th>Annual Income</th>\n",
       "      <th>Years in current job</th>\n",
       "      <th>Tax Liens</th>\n",
       "      <th>Number of Open Accounts</th>\n",
       "      <th>Years of Credit History</th>\n",
       "      <th>Maximum Open Credit</th>\n",
       "      <th>Number of Credit Problems</th>\n",
       "      <th>Months since last delinquent</th>\n",
       "      <th>Bankruptcies</th>\n",
       "      <th>Purpose</th>\n",
       "      <th>Term</th>\n",
       "      <th>Current Loan Amount</th>\n",
       "      <th>Current Credit Balance</th>\n",
       "      <th>Monthly Debt</th>\n",
       "      <th>Credit Score</th>\n",
       "      <th>Credit Default</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>Own Home</td>\n",
       "      <td>482087.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>26.3</td>\n",
       "      <td>685960.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>debt consolidation</td>\n",
       "      <td>Short Term</td>\n",
       "      <td>99999999.0</td>\n",
       "      <td>47386.0</td>\n",
       "      <td>7914.0</td>\n",
       "      <td>749.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>Own Home</td>\n",
       "      <td>1025487.0</td>\n",
       "      <td>10+ years</td>\n",
       "      <td>0.0</td>\n",
       "      <td>15.0</td>\n",
       "      <td>15.3</td>\n",
       "      <td>1181730.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>debt consolidation</td>\n",
       "      <td>Long Term</td>\n",
       "      <td>264968.0</td>\n",
       "      <td>394972.0</td>\n",
       "      <td>18373.0</td>\n",
       "      <td>737.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>Home Mortgage</td>\n",
       "      <td>751412.0</td>\n",
       "      <td>8 years</td>\n",
       "      <td>0.0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1182434.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>debt consolidation</td>\n",
       "      <td>Short Term</td>\n",
       "      <td>99999999.0</td>\n",
       "      <td>308389.0</td>\n",
       "      <td>13651.0</td>\n",
       "      <td>742.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>Own Home</td>\n",
       "      <td>805068.0</td>\n",
       "      <td>6 years</td>\n",
       "      <td>0.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>22.5</td>\n",
       "      <td>147400.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>debt consolidation</td>\n",
       "      <td>Short Term</td>\n",
       "      <td>121396.0</td>\n",
       "      <td>95855.0</td>\n",
       "      <td>11338.0</td>\n",
       "      <td>694.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4</td>\n",
       "      <td>Rent</td>\n",
       "      <td>776264.0</td>\n",
       "      <td>8 years</td>\n",
       "      <td>0.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>13.6</td>\n",
       "      <td>385836.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>debt consolidation</td>\n",
       "      <td>Short Term</td>\n",
       "      <td>125840.0</td>\n",
       "      <td>93309.0</td>\n",
       "      <td>7180.0</td>\n",
       "      <td>719.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Id Home Ownership  Annual Income Years in current job  Tax Liens  \\\n",
       "0   0       Own Home       482087.0                  NaN        0.0   \n",
       "1   1       Own Home      1025487.0            10+ years        0.0   \n",
       "2   2  Home Mortgage       751412.0              8 years        0.0   \n",
       "3   3       Own Home       805068.0              6 years        0.0   \n",
       "4   4           Rent       776264.0              8 years        0.0   \n",
       "\n",
       "   Number of Open Accounts  Years of Credit History  Maximum Open Credit  \\\n",
       "0                     11.0                     26.3             685960.0   \n",
       "1                     15.0                     15.3            1181730.0   \n",
       "2                     11.0                     35.0            1182434.0   \n",
       "3                      8.0                     22.5             147400.0   \n",
       "4                     13.0                     13.6             385836.0   \n",
       "\n",
       "   Number of Credit Problems  Months since last delinquent  Bankruptcies  \\\n",
       "0                        1.0                           NaN           1.0   \n",
       "1                        0.0                           NaN           0.0   \n",
       "2                        0.0                           NaN           0.0   \n",
       "3                        1.0                           NaN           1.0   \n",
       "4                        1.0                           NaN           0.0   \n",
       "\n",
       "              Purpose        Term  Current Loan Amount  \\\n",
       "0  debt consolidation  Short Term           99999999.0   \n",
       "1  debt consolidation   Long Term             264968.0   \n",
       "2  debt consolidation  Short Term           99999999.0   \n",
       "3  debt consolidation  Short Term             121396.0   \n",
       "4  debt consolidation  Short Term             125840.0   \n",
       "\n",
       "   Current Credit Balance  Monthly Debt  Credit Score  Credit Default  \n",
       "0                 47386.0        7914.0         749.0               0  \n",
       "1                394972.0       18373.0         737.0               1  \n",
       "2                308389.0       13651.0         742.0               0  \n",
       "3                 95855.0       11338.0         694.0               0  \n",
       "4                 93309.0        7180.0         719.0               0  "
      ]
     },
     "execution_count": 257,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f4e4b456",
   "metadata": {},
   "source": [
    "## 1.3 特征名映射"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 258,
   "id": "3c1ee3d0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Id</th>\n",
       "      <th>房屋所有权</th>\n",
       "      <th>年收入</th>\n",
       "      <th>当前工作年限</th>\n",
       "      <th>税收留置权</th>\n",
       "      <th>开放账户数量</th>\n",
       "      <th>信用历史年限</th>\n",
       "      <th>最大开放信用额度</th>\n",
       "      <th>信用问题数量</th>\n",
       "      <th>距上次拖欠月数</th>\n",
       "      <th>破产次数</th>\n",
       "      <th>贷款目的</th>\n",
       "      <th>贷款期限</th>\n",
       "      <th>当前贷款金额</th>\n",
       "      <th>当前信用余额</th>\n",
       "      <th>月债务</th>\n",
       "      <th>信用评分</th>\n",
       "      <th>信用违约</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>Own Home</td>\n",
       "      <td>482087.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>26.3</td>\n",
       "      <td>685960.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>debt consolidation</td>\n",
       "      <td>Short Term</td>\n",
       "      <td>99999999.0</td>\n",
       "      <td>47386.0</td>\n",
       "      <td>7914.0</td>\n",
       "      <td>749.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>Own Home</td>\n",
       "      <td>1025487.0</td>\n",
       "      <td>10+ years</td>\n",
       "      <td>0.0</td>\n",
       "      <td>15.0</td>\n",
       "      <td>15.3</td>\n",
       "      <td>1181730.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>debt consolidation</td>\n",
       "      <td>Long Term</td>\n",
       "      <td>264968.0</td>\n",
       "      <td>394972.0</td>\n",
       "      <td>18373.0</td>\n",
       "      <td>737.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>Home Mortgage</td>\n",
       "      <td>751412.0</td>\n",
       "      <td>8 years</td>\n",
       "      <td>0.0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1182434.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>debt consolidation</td>\n",
       "      <td>Short Term</td>\n",
       "      <td>99999999.0</td>\n",
       "      <td>308389.0</td>\n",
       "      <td>13651.0</td>\n",
       "      <td>742.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>Own Home</td>\n",
       "      <td>805068.0</td>\n",
       "      <td>6 years</td>\n",
       "      <td>0.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>22.5</td>\n",
       "      <td>147400.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>debt consolidation</td>\n",
       "      <td>Short Term</td>\n",
       "      <td>121396.0</td>\n",
       "      <td>95855.0</td>\n",
       "      <td>11338.0</td>\n",
       "      <td>694.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4</td>\n",
       "      <td>Rent</td>\n",
       "      <td>776264.0</td>\n",
       "      <td>8 years</td>\n",
       "      <td>0.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>13.6</td>\n",
       "      <td>385836.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>debt consolidation</td>\n",
       "      <td>Short Term</td>\n",
       "      <td>125840.0</td>\n",
       "      <td>93309.0</td>\n",
       "      <td>7180.0</td>\n",
       "      <td>719.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Id          房屋所有权        年收入     当前工作年限  税收留置权  开放账户数量  信用历史年限   最大开放信用额度  \\\n",
       "0   0       Own Home   482087.0        NaN    0.0    11.0    26.3   685960.0   \n",
       "1   1       Own Home  1025487.0  10+ years    0.0    15.0    15.3  1181730.0   \n",
       "2   2  Home Mortgage   751412.0    8 years    0.0    11.0    35.0  1182434.0   \n",
       "3   3       Own Home   805068.0    6 years    0.0     8.0    22.5   147400.0   \n",
       "4   4           Rent   776264.0    8 years    0.0    13.0    13.6   385836.0   \n",
       "\n",
       "   信用问题数量  距上次拖欠月数  破产次数                贷款目的        贷款期限      当前贷款金额  \\\n",
       "0     1.0      NaN   1.0  debt consolidation  Short Term  99999999.0   \n",
       "1     0.0      NaN   0.0  debt consolidation   Long Term    264968.0   \n",
       "2     0.0      NaN   0.0  debt consolidation  Short Term  99999999.0   \n",
       "3     1.0      NaN   1.0  debt consolidation  Short Term    121396.0   \n",
       "4     1.0      NaN   0.0  debt consolidation  Short Term    125840.0   \n",
       "\n",
       "     当前信用余额      月债务   信用评分  信用违约  \n",
       "0   47386.0   7914.0  749.0     0  \n",
       "1  394972.0  18373.0  737.0     1  \n",
       "2  308389.0  13651.0  742.0     0  \n",
       "3   95855.0  11338.0  694.0     0  \n",
       "4   93309.0   7180.0  719.0     0  "
      ]
     },
     "execution_count": 258,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 创建特征名中文映射字典\n",
    "feature_name_mapping = {\n",
    "    'Annual Income': '年收入',\n",
    "    'Years in current job': '当前工作年限',\n",
    "    'Tax Liens': '税收留置权',\n",
    "    'Number of Open Accounts': '开放账户数量',\n",
    "    'Years of Credit History': '信用历史年限',\n",
    "    'Maximum Open Credit': '最大开放信用额度',\n",
    "    'Number of Credit Problems': '信用问题数量',\n",
    "    'Months since last delinquent': '距上次拖欠月数',\n",
    "    'Bankruptcies': '破产次数',\n",
    "    'Current Loan Amount': '当前贷款金额',\n",
    "    'Current Credit Balance': '当前信用余额',\n",
    "    'Monthly Debt': '月债务',\n",
    "    'Credit Score': '信用评分',\n",
    "    'Home Ownership': '房屋所有权',\n",
    "    'Term': '贷款期限',\n",
    "    'Purpose': '贷款目的',\n",
    "    'Credit Default': '信用违约'\n",
    "}\n",
    "\n",
    "# 重命名数据框的列名\n",
    "data = data.rename(columns=feature_name_mapping)\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b4297fdb",
   "metadata": {},
   "source": [
    "## 1.4 删除无用列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 259,
   "id": "f60634f1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>房屋所有权</th>\n",
       "      <th>年收入</th>\n",
       "      <th>当前工作年限</th>\n",
       "      <th>税收留置权</th>\n",
       "      <th>开放账户数量</th>\n",
       "      <th>信用历史年限</th>\n",
       "      <th>最大开放信用额度</th>\n",
       "      <th>信用问题数量</th>\n",
       "      <th>距上次拖欠月数</th>\n",
       "      <th>破产次数</th>\n",
       "      <th>贷款目的</th>\n",
       "      <th>贷款期限</th>\n",
       "      <th>当前贷款金额</th>\n",
       "      <th>当前信用余额</th>\n",
       "      <th>月债务</th>\n",
       "      <th>信用评分</th>\n",
       "      <th>信用违约</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Own Home</td>\n",
       "      <td>482087.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>26.3</td>\n",
       "      <td>685960.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>debt consolidation</td>\n",
       "      <td>Short Term</td>\n",
       "      <td>99999999.0</td>\n",
       "      <td>47386.0</td>\n",
       "      <td>7914.0</td>\n",
       "      <td>749.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Own Home</td>\n",
       "      <td>1025487.0</td>\n",
       "      <td>10+ years</td>\n",
       "      <td>0.0</td>\n",
       "      <td>15.0</td>\n",
       "      <td>15.3</td>\n",
       "      <td>1181730.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>debt consolidation</td>\n",
       "      <td>Long Term</td>\n",
       "      <td>264968.0</td>\n",
       "      <td>394972.0</td>\n",
       "      <td>18373.0</td>\n",
       "      <td>737.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Home Mortgage</td>\n",
       "      <td>751412.0</td>\n",
       "      <td>8 years</td>\n",
       "      <td>0.0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1182434.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>debt consolidation</td>\n",
       "      <td>Short Term</td>\n",
       "      <td>99999999.0</td>\n",
       "      <td>308389.0</td>\n",
       "      <td>13651.0</td>\n",
       "      <td>742.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Own Home</td>\n",
       "      <td>805068.0</td>\n",
       "      <td>6 years</td>\n",
       "      <td>0.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>22.5</td>\n",
       "      <td>147400.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>debt consolidation</td>\n",
       "      <td>Short Term</td>\n",
       "      <td>121396.0</td>\n",
       "      <td>95855.0</td>\n",
       "      <td>11338.0</td>\n",
       "      <td>694.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Rent</td>\n",
       "      <td>776264.0</td>\n",
       "      <td>8 years</td>\n",
       "      <td>0.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>13.6</td>\n",
       "      <td>385836.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>debt consolidation</td>\n",
       "      <td>Short Term</td>\n",
       "      <td>125840.0</td>\n",
       "      <td>93309.0</td>\n",
       "      <td>7180.0</td>\n",
       "      <td>719.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           房屋所有权        年收入     当前工作年限  税收留置权  开放账户数量  信用历史年限   最大开放信用额度  \\\n",
       "0       Own Home   482087.0        NaN    0.0    11.0    26.3   685960.0   \n",
       "1       Own Home  1025487.0  10+ years    0.0    15.0    15.3  1181730.0   \n",
       "2  Home Mortgage   751412.0    8 years    0.0    11.0    35.0  1182434.0   \n",
       "3       Own Home   805068.0    6 years    0.0     8.0    22.5   147400.0   \n",
       "4           Rent   776264.0    8 years    0.0    13.0    13.6   385836.0   \n",
       "\n",
       "   信用问题数量  距上次拖欠月数  破产次数                贷款目的        贷款期限      当前贷款金额  \\\n",
       "0     1.0      NaN   1.0  debt consolidation  Short Term  99999999.0   \n",
       "1     0.0      NaN   0.0  debt consolidation   Long Term    264968.0   \n",
       "2     0.0      NaN   0.0  debt consolidation  Short Term  99999999.0   \n",
       "3     1.0      NaN   1.0  debt consolidation  Short Term    121396.0   \n",
       "4     1.0      NaN   0.0  debt consolidation  Short Term    125840.0   \n",
       "\n",
       "     当前信用余额      月债务   信用评分  信用违约  \n",
       "0   47386.0   7914.0  749.0     0  \n",
       "1  394972.0  18373.0  737.0     1  \n",
       "2  308389.0  13651.0  742.0     0  \n",
       "3   95855.0  11338.0  694.0     0  \n",
       "4   93309.0   7180.0  719.0     0  "
      ]
     },
     "execution_count": 259,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 删除无用列，如ID列\n",
    "data = data.drop(columns=['Id'])\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "44e5468f",
   "metadata": {},
   "source": [
    "## 1.5 编码映射"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "97e2af8d",
   "metadata": {},
   "source": [
    "先打印看一下可能有问题的几个特征的分布，分别是房屋所有权、当前工作年限、贷款目的、贷款期限\n",
    "\n",
    "这里我们没有再用根据变量类型来筛选的方法了，因为那样很粗糙，我们实际上肯定要理解数据的"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 260,
   "id": "5f205011",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "房屋所有权\n",
      "Home Mortgage    3637\n",
      "Rent             3204\n",
      "Own Home          647\n",
      "Have Mortgage      12\n",
      "Name: count, dtype: int64\n"
     ]
    }
   ],
   "source": [
    "# 先打印看一下可能有问题的几个特征的分布，分别是房屋所有权、当前工作年限、贷款目的、贷款期限\n",
    "print(data['房屋所有权'].value_counts())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 261,
   "id": "6eba4ac9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "房屋所有权\n",
       "1    3649\n",
       "0    3204\n",
       "2     647\n",
       "Name: count, dtype: int64"
      ]
     },
     "execution_count": 261,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 对房屋所有权进行映射\n",
    "mappings = {\n",
    "    \"房屋所有权\": {\n",
    "        \"Rent\": 0, # 租房\n",
    "        \"Have Mortgage\": 1, # 有房贷\n",
    "        \"Home Mortgage\": 1, # 有房贷\n",
    "        \"Own Home\": 2, # 自有住房\n",
    "    }\n",
    "}\n",
    "data = data.replace(mappings)\n",
    "data['房屋所有权'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 262,
   "id": "a5b082e4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "当前工作年限\n",
      "10+ years    2332\n",
      "2 years       705\n",
      "3 years       620\n",
      "< 1 year      563\n",
      "5 years       516\n",
      "1 year        504\n",
      "4 years       469\n",
      "6 years       426\n",
      "7 years       396\n",
      "8 years       339\n",
      "9 years       259\n",
      "Name: count, dtype: int64\n"
     ]
    }
   ],
   "source": [
    "print(data['当前工作年限'].value_counts())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 263,
   "id": "0be19a2f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "当前工作年限\n",
       "10.0    2332\n",
       "2.0      705\n",
       "3.0      620\n",
       "0.0      563\n",
       "5.0      516\n",
       "1.0      504\n",
       "4.0      469\n",
       "6.0      426\n",
       "7.0      396\n",
       "8.0      339\n",
       "9.0      259\n",
       "Name: count, dtype: int64"
      ]
     },
     "execution_count": 263,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 对当前工作年限进行映射\n",
    "work_year_mappings = {\n",
    "    \"当前工作年限\": {\n",
    "        \"< 1 year\": 0,\n",
    "        \"1 year\": 1,\n",
    "        \"2 years\": 2,\n",
    "        \"3 years\": 3,\n",
    "        \"4 years\": 4,\n",
    "        \"5 years\": 5,\n",
    "        \"6 years\": 6,\n",
    "        \"7 years\": 7,\n",
    "        \"8 years\": 8,\n",
    "        \"9 years\": 9,\n",
    "        \"10+ years\": 10\n",
    "    }\n",
    "}\n",
    "data = data.replace(work_year_mappings)\n",
    "data['当前工作年限'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 264,
   "id": "f9a8471c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "贷款目的\n",
      "debt consolidation      5944\n",
      "other                    665\n",
      "home improvements        412\n",
      "business loan            129\n",
      "buy a car                 96\n",
      "medical bills             71\n",
      "major purchase            40\n",
      "take a trip               37\n",
      "buy house                 34\n",
      "small business            26\n",
      "wedding                   15\n",
      "moving                    11\n",
      "educational expenses      10\n",
      "vacation                   8\n",
      "renewable energy           2\n",
      "Name: count, dtype: int64\n"
     ]
    }
   ],
   "source": [
    "print(data['贷款目的'].value_counts())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6a7fcac",
   "metadata": {},
   "source": [
    "贷款目的是一个需要独热编码的特征，这里为了方便，我们还是沿用便签编码了-----法无定则\n",
    "\n",
    "注意我们之前说了，如果独热编码的话，就需要先填补缺失值，再独热编码比较合适\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 265,
   "id": "e0515eac",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "贷款目的\n",
      "0     5944\n",
      "1      665\n",
      "2      412\n",
      "3      129\n",
      "4       96\n",
      "5       71\n",
      "6       40\n",
      "7       37\n",
      "8       34\n",
      "9       26\n",
      "10      15\n",
      "11      11\n",
      "12      10\n",
      "13       8\n",
      "14       2\n",
      "Name: count, dtype: int64\n"
     ]
    }
   ],
   "source": [
    "# 根据 value_counts() 频率顺序定义映射字典\n",
    "purpose_ordinal_map = {\n",
    "    \"debt consolidation\": 0,\n",
    "    \"other\": 1,\n",
    "    \"home improvements\": 2,\n",
    "    \"business loan\": 3,\n",
    "    \"buy a car\": 4,\n",
    "    \"medical bills\": 5,\n",
    "    \"major purchase\": 6,\n",
    "    \"take a trip\": 7,\n",
    "    \"buy house\": 8,\n",
    "    \"small business\": 9,\n",
    "    \"wedding\": 10,\n",
    "    \"moving\": 11,\n",
    "    \"educational expenses\": 12,\n",
    "    \"vacation\": 13,\n",
    "    \"renewable energy\": 14\n",
    "}\n",
    "\n",
    "# 直接替换原列\n",
    "data['贷款目的'] = data['贷款目的'].replace(purpose_ordinal_map)\n",
    "\n",
    "print(data['贷款目的'].value_counts())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 266,
   "id": "99aa0309",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "贷款期限\n",
      "Short Term    5556\n",
      "Long Term     1944\n",
      "Name: count, dtype: int64\n"
     ]
    }
   ],
   "source": [
    "print(data['贷款期限'].value_counts())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 267,
   "id": "39f1a289",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "贷款期限\n",
       "0    5556\n",
       "1    1944\n",
       "Name: count, dtype: int64"
      ]
     },
     "execution_count": 267,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 对贷款期限进行映射\n",
    "term_mapping = {\n",
    "        \"Short Term\": 0,\n",
    "        \"Long Term\": 1\n",
    "}\n",
    "data['贷款期限'] = data['贷款期限'].replace(term_mapping)\n",
    "data['贷款期限'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 268,
   "id": "7875fd92",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>房屋所有权</th>\n",
       "      <th>年收入</th>\n",
       "      <th>当前工作年限</th>\n",
       "      <th>税收留置权</th>\n",
       "      <th>开放账户数量</th>\n",
       "      <th>信用历史年限</th>\n",
       "      <th>最大开放信用额度</th>\n",
       "      <th>信用问题数量</th>\n",
       "      <th>距上次拖欠月数</th>\n",
       "      <th>破产次数</th>\n",
       "      <th>贷款目的</th>\n",
       "      <th>贷款期限</th>\n",
       "      <th>当前贷款金额</th>\n",
       "      <th>当前信用余额</th>\n",
       "      <th>月债务</th>\n",
       "      <th>信用评分</th>\n",
       "      <th>信用违约</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2</td>\n",
       "      <td>482087.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>26.3</td>\n",
       "      <td>685960.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>99999999.0</td>\n",
       "      <td>47386.0</td>\n",
       "      <td>7914.0</td>\n",
       "      <td>749.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1025487.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>15.0</td>\n",
       "      <td>15.3</td>\n",
       "      <td>1181730.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>264968.0</td>\n",
       "      <td>394972.0</td>\n",
       "      <td>18373.0</td>\n",
       "      <td>737.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>751412.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1182434.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>99999999.0</td>\n",
       "      <td>308389.0</td>\n",
       "      <td>13651.0</td>\n",
       "      <td>742.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2</td>\n",
       "      <td>805068.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>22.5</td>\n",
       "      <td>147400.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>121396.0</td>\n",
       "      <td>95855.0</td>\n",
       "      <td>11338.0</td>\n",
       "      <td>694.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>776264.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>13.6</td>\n",
       "      <td>385836.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>125840.0</td>\n",
       "      <td>93309.0</td>\n",
       "      <td>7180.0</td>\n",
       "      <td>719.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   房屋所有权        年收入  当前工作年限  税收留置权  开放账户数量  信用历史年限   最大开放信用额度  信用问题数量  \\\n",
       "0      2   482087.0     NaN    0.0    11.0    26.3   685960.0     1.0   \n",
       "1      2  1025487.0    10.0    0.0    15.0    15.3  1181730.0     0.0   \n",
       "2      1   751412.0     8.0    0.0    11.0    35.0  1182434.0     0.0   \n",
       "3      2   805068.0     6.0    0.0     8.0    22.5   147400.0     1.0   \n",
       "4      0   776264.0     8.0    0.0    13.0    13.6   385836.0     1.0   \n",
       "\n",
       "   距上次拖欠月数  破产次数  贷款目的  贷款期限      当前贷款金额    当前信用余额      月债务   信用评分  信用违约  \n",
       "0      NaN   1.0     0     0  99999999.0   47386.0   7914.0  749.0     0  \n",
       "1      NaN   0.0     0     1    264968.0  394972.0  18373.0  737.0     1  \n",
       "2      NaN   0.0     0     0  99999999.0  308389.0  13651.0  742.0     0  \n",
       "3      NaN   1.0     0     0    121396.0   95855.0  11338.0  694.0     0  \n",
       "4      NaN   0.0     0     0    125840.0   93309.0   7180.0  719.0     0  "
      ]
     },
     "execution_count": 268,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "63572b48",
   "metadata": {},
   "source": [
    "现在已经没有非数值类型的特征了"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 269,
   "id": "652909d8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 7500 entries, 0 to 7499\n",
      "Data columns (total 17 columns):\n",
      " #   Column    Non-Null Count  Dtype  \n",
      "---  ------    --------------  -----  \n",
      " 0   房屋所有权     7500 non-null   int64  \n",
      " 1   年收入       5943 non-null   float64\n",
      " 2   当前工作年限    7129 non-null   float64\n",
      " 3   税收留置权     7500 non-null   float64\n",
      " 4   开放账户数量    7500 non-null   float64\n",
      " 5   信用历史年限    7500 non-null   float64\n",
      " 6   最大开放信用额度  7500 non-null   float64\n",
      " 7   信用问题数量    7500 non-null   float64\n",
      " 8   距上次拖欠月数   3419 non-null   float64\n",
      " 9   破产次数      7486 non-null   float64\n",
      " 10  贷款目的      7500 non-null   int64  \n",
      " 11  贷款期限      7500 non-null   int64  \n",
      " 12  当前贷款金额    7500 non-null   float64\n",
      " 13  当前信用余额    7500 non-null   float64\n",
      " 14  月债务       7500 non-null   float64\n",
      " 15  信用评分      5943 non-null   float64\n",
      " 16  信用违约      7500 non-null   int64  \n",
      "dtypes: float64(13), int64(4)\n",
      "memory usage: 996.2 KB\n"
     ]
    }
   ],
   "source": [
    "data.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1aacdae8",
   "metadata": {},
   "source": [
    "## 1.6 填补缺失值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 270,
   "id": "4b9512de",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "开始使用众数（Mode）填充缺失值...\n",
      "列 '年收入' 已使用众数 969475.0 填充。\n",
      "列 '当前工作年限' 已使用众数 10.0 填充。\n",
      "列 '距上次拖欠月数' 已使用众数 14.0 填充。\n",
      "列 '破产次数' 已使用众数 0.0 填充。\n",
      "列 '信用评分' 已使用众数 740.0 填充。\n",
      "\n",
      "--- 缺失值处理后的 DataFrame 信息 ---\n",
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 7500 entries, 0 to 7499\n",
      "Data columns (total 17 columns):\n",
      " #   Column    Non-Null Count  Dtype  \n",
      "---  ------    --------------  -----  \n",
      " 0   房屋所有权     7500 non-null   int64  \n",
      " 1   年收入       7500 non-null   float64\n",
      " 2   当前工作年限    7500 non-null   float64\n",
      " 3   税收留置权     7500 non-null   float64\n",
      " 4   开放账户数量    7500 non-null   float64\n",
      " 5   信用历史年限    7500 non-null   float64\n",
      " 6   最大开放信用额度  7500 non-null   float64\n",
      " 7   信用问题数量    7500 non-null   float64\n",
      " 8   距上次拖欠月数   7500 non-null   float64\n",
      " 9   破产次数      7500 non-null   float64\n",
      " 10  贷款目的      7500 non-null   int64  \n",
      " 11  贷款期限      7500 non-null   int64  \n",
      " 12  当前贷款金额    7500 non-null   float64\n",
      " 13  当前信用余额    7500 non-null   float64\n",
      " 14  月债务       7500 non-null   float64\n",
      " 15  信用评分      7500 non-null   float64\n",
      " 16  信用违约      7500 non-null   int64  \n",
      "dtypes: float64(13), int64(4)\n",
      "memory usage: 996.2 KB\n"
     ]
    }
   ],
   "source": [
    "\n",
    "# 确定需要填充的列\n",
    "missing_cols = [\n",
    "    '年收入', \n",
    "    '当前工作年限', \n",
    "    '距上次拖欠月数', \n",
    "    '破产次数', \n",
    "    '信用评分'\n",
    "]\n",
    "\n",
    "print(\"开始使用众数（Mode）填充缺失值...\")\n",
    "\n",
    "for col in missing_cols:\n",
    "    # 计算众数。由于可能存在多个众数，.mode() 返回 Series，我们取第一个 [0]\n",
    "    mode_val = data[col].mode()[0]\n",
    "    \n",
    "    # 使用众数填充缺失值\n",
    "    data[col].fillna(mode_val, inplace=True)\n",
    "    \n",
    "    print(f\"列 '{col}' 已使用众数 {mode_val} 填充。\")\n",
    "\n",
    "# 验证处理结果\n",
    "print(\"\\n--- 缺失值处理后的 DataFrame 信息 ---\")\n",
    "data.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f63e3d2",
   "metadata": {},
   "source": [
    "## 1.7 异常值处理"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62480575",
   "metadata": {},
   "source": [
    "1. 我们之前没说异常值的处理，实际上是可处理可不处理的，因为这需要考验你对于实际数据的理解，而且不处理也会增强泛化性和鲁棒性，因为实际中很多收上来的数据本身就有问题\n",
    "2. 异常值一般不处理，或者结合对照试验处理和不处理都尝试下，但是论文中要写这个，作为工作量"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 271,
   "id": "b25b10e4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>房屋所有权</th>\n",
       "      <th>年收入</th>\n",
       "      <th>当前工作年限</th>\n",
       "      <th>税收留置权</th>\n",
       "      <th>开放账户数量</th>\n",
       "      <th>信用历史年限</th>\n",
       "      <th>最大开放信用额度</th>\n",
       "      <th>信用问题数量</th>\n",
       "      <th>距上次拖欠月数</th>\n",
       "      <th>破产次数</th>\n",
       "      <th>贷款目的</th>\n",
       "      <th>贷款期限</th>\n",
       "      <th>当前贷款金额</th>\n",
       "      <th>当前信用余额</th>\n",
       "      <th>月债务</th>\n",
       "      <th>信用评分</th>\n",
       "      <th>信用违约</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2</td>\n",
       "      <td>482087.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>26.3</td>\n",
       "      <td>685960.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>14.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>99999999.0</td>\n",
       "      <td>47386.0</td>\n",
       "      <td>7914.0</td>\n",
       "      <td>749.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1025487.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>15.0</td>\n",
       "      <td>15.3</td>\n",
       "      <td>1181730.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>14.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>264968.0</td>\n",
       "      <td>394972.0</td>\n",
       "      <td>18373.0</td>\n",
       "      <td>737.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>751412.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1182434.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>14.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>99999999.0</td>\n",
       "      <td>308389.0</td>\n",
       "      <td>13651.0</td>\n",
       "      <td>742.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2</td>\n",
       "      <td>805068.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>22.5</td>\n",
       "      <td>147400.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>14.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>121396.0</td>\n",
       "      <td>95855.0</td>\n",
       "      <td>11338.0</td>\n",
       "      <td>694.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>776264.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>13.6</td>\n",
       "      <td>385836.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>14.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>125840.0</td>\n",
       "      <td>93309.0</td>\n",
       "      <td>7180.0</td>\n",
       "      <td>719.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   房屋所有权        年收入  当前工作年限  税收留置权  开放账户数量  信用历史年限   最大开放信用额度  信用问题数量  \\\n",
       "0      2   482087.0    10.0    0.0    11.0    26.3   685960.0     1.0   \n",
       "1      2  1025487.0    10.0    0.0    15.0    15.3  1181730.0     0.0   \n",
       "2      1   751412.0     8.0    0.0    11.0    35.0  1182434.0     0.0   \n",
       "3      2   805068.0     6.0    0.0     8.0    22.5   147400.0     1.0   \n",
       "4      0   776264.0     8.0    0.0    13.0    13.6   385836.0     1.0   \n",
       "\n",
       "   距上次拖欠月数  破产次数  贷款目的  贷款期限      当前贷款金额    当前信用余额      月债务   信用评分  信用违约  \n",
       "0     14.0   1.0     0     0  99999999.0   47386.0   7914.0  749.0     0  \n",
       "1     14.0   0.0     0     1    264968.0  394972.0  18373.0  737.0     1  \n",
       "2     14.0   0.0     0     0  99999999.0  308389.0  13651.0  742.0     0  \n",
       "3     14.0   1.0     0     0    121396.0   95855.0  11338.0  694.0     0  \n",
       "4     14.0   0.0     0     0    125840.0   93309.0   7180.0  719.0     0  "
      ]
     },
     "execution_count": 271,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.head()"
   ]
  },
  {
   "attachments": {
    "image.png": {
     "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlgAAAFoCAIAAAAElhK7AAAQAElEQVR4AeydD3BV153f5WSdbIJe2i3eRTyGcRNAoumkIpLwlIG1kLYzgvGLiMGrgC1HSRVRqbhdoDHuhIjUyN6pRSvRP7a0yHSiWARCFhwpjwW1E2RhUWYWwQh327UAd7MOlsTGZHejtyR2YrNfOHB9/d7T00N6f+675/Pml+Pf/Z1zzzm/z0/WN+deSf7Iu3wgAAEIQAACFhP4SB4fCEAAAhCAgMUEEEKbik+uEIAABCAQQwAhjEFCAAIQgAAEbCKAENpUbXK1iQC5QgACSRJACJMExTAIQAACEPAnAYTQn3UlKwhAwCYC5DorAgjhrPBxMwQgAAEI5DoBhDDXK8j+IQABCEBgVgRyTAhnlSs3QwACEIAABGIIIIQxSAhAAAIQgIBNBBBCm6qdY7myXQhAAAKZIIAQZoIya0AAAhCAgGcJIISeLQ0bg4BNBMgVAtkjgBBmjz0rQwACEICABwgghB4oAluAAAQgYBMBr+WKEHqtIuwHAhCAAAQySgAhzChuFoMABCAAAa8RQAjTWRHmhgAEIAABzxNACD1fIjYIAQhAAALpJIAQppMuc9tEgFwhAIEcJYAQ5mjh2DYEIAABCKSGgE+E8CIfCEDAwwRS8+3KQ7OwFV8R8IkQqiYL+EAAAp4koH89MQh4mYB/hNDLlNkbBCAAAQh4lsA0QujZfbMxCEAAAhCAQEoIIIQpwcgkEIAABCCQqwQQwlytXBr2zZQQgAAEbCSAENpYdXKGAAQgAAGHAELooMCBgE0EyBUCELhDACG8Q4J/QgACEICAlQQQQivLTtIQgIBNBMg1MQGEMDEfeiEAAQhAwOcEEEKfF5j0IAABCEAgMQF/CWHiXOmFAAQgAAEIxBBACGOQEIAABCAAAZsIIIQ2VdtfuZINBCAAgZQQQAhTgpFJIAABCEAgVwkghLlaOfYNAZsIkCsE0kgAIUwjXKaGAAQgAAHvE0AIvV8jdggBCEDAJgIZzxUhzDhyFoQABCAAAS8RQAi9VA32AgEIQAACGSeAEGYc+QcL4kEAAhCAQPYJIITZrwE7gAAEIACBLBJACLMIn6VtIkCuEICAVwkghF6tDPuCAAQgAIGMEEAIM4KZRSAAAZsIZCXX3vCJxOtOTkbGxifcY16/eFlBd8ROHyG0s+5kDQEIeJFAb7i/+enWqSxqx1K1tesedZRs1+5WM0DxmscanLgJqj05eLq17Xk5jumW0YtvOJfWOgihtaUncQhAwHMEhs+PBAJz1oWqoqxy9cq+Y9EHvuD8Ag3rOXQkKo2Oru51X1gTCORHxfvC/ZXlq+obt0s+jY1evLx1R7PxFY8ab8/l7ITQHk5kCgEIQCAjBBYEC4LBeVEWyJ8Ttbh0q/iBSmleZ1e3HJkGqJVJ8HTyk1PftE1BYzomjo2PV4eqWr711P7OdtnePS3qOtzTJV+muC7tNITQzrp7N+t3fvKTv/s/r/3iMo9rvFsjdpZWAh1d365v3BZlzbufC86fF7Xu7l1PXfjTk1OZet3jO7q6g8ECRTTPwOBQb/iETKdGtbKTg0OKq9dOQwjtrPtMsk73PZHXLvzfr331tdpN/++Jf/1n9XUXNjz8s+N/ktZFr127ltb5mRwCMyDw5LYnjvcejGt67afXh+45Tw6eNg82o1rF3cN6w/06JjqRnoNHPpWfv7Rw8Y5tWxbML5B/4GD081VnsA0OQmhDlXMjx///bMv1S5ecvb779k/f+I9/mL6jYTgcPnPmjLMcDgQ8QsA5rvXeOrS5W53qxibGzT73d7bpBaGkUQ9Rj/d+121lpcWKq3d/R7sG66GoHp82NtTJd+xG3g23OXE7HYTQzrp7Luu//d+n35n40A92my3+zaunjJPC9vr1693d3RLCFM7pu6lIKDsEmhrqgvMLhs9fiGuTkciO7VuidjY2NiGBdNvoxctRY/RqUNO6g2PjV93m7rLQRwgtLLoXU373r67G3dY7fxVHHeOOTDIoFWxra+MsmCQuhmWYgORKUldUuDiQn9+y6ylj7suiJYvdW1pauKg6tMZEdOwzTkX5KsWNr1ZzSl/luK2spHj5HSsqXOTustBHCC0suhdT/sjHPhZ3Wx+dIh53cDLBzs7OK1eumJE6FDbyST+BlpaWxy3+HD161Hy9JW71MHNsfOK2jd1xTOTDlxpppjo5OPT6xcvB+fOMKWgctYqffGVIkams71h/7x0bOHV6qmGWxBFCSwrt9TQ/uaQw7hY/MUU87uBkgnV1dQsXLjQjQ6GQdBFLN4Hm5uaXLP6sX7/efL0lbvUstHn3c8b0mlCPN42vVpfD50fkGBu98yvwBw4e7Qv3D995jqr5HV/xPe0vKDKV1W7coGOisdovbZhqmCVxhNCSQns9TQnefVW3n/A4ew189p/eF/qCc5kSZ+7cuTt37qysrEzJbEwCgVQRqChfub+j3Vhe3j0tu/698dVWh9ZUPLhKjrGy0uK8Ox91mcenahVTa0wKp8sE1tr+vJFVtR0vdicYaUOXVUJoQ0FzOMdPf+Ob//gPtv7WP18xZ+nSf/D5kgV1Xy3674n+L+1sUq2pqQmFQrOZgXshkCYCt1713XCr3d0uNHxu5OTg6UAg+nfw3fPs3dNiZFWttNPdZaGPEFpYdO+m/NvrH1n83J7P/tGLhXv/a/Bf1t/z0Y+mb68SwhUrVqRvfmaGwN0S0AvC+sbtZ8+P7O+8+WsP095eVLhowfybvyMfNVIv/8YmJmo3PhIV12XHrT9Do4XWrnu0+IFKY/IVkd8b7tcYCw0htLDodqScRJZ6TJrEKIZAIEMEzI936ogmx71k5eqV675Q5Y4Yf8f2Le6Do/Obgjrh7W1tqQ596JalhYvWPbRGj0wvTP3HaNZ9+Bazig0tQmhDlckRAhDIDQJuYXN2XLRkcVnJMudyKkciN1WX4kWFi6OkUUHMEEAIDQdaCEAghwmwdQjMhgBCOBt63AsBCEAAAjlPACHM+RKSAAQgAAGbCKQ+V4Qw9UyZEQIQgAAEcogAQphDxWKrEIAABCCQegIIYeqZpmpG5oEABCAAgQwQQAgzAJklIAABCEDAuwQQQu/Whp3ZRIBcIQCBrBFACLOGnoUhAAEIQMALBBBCL1SBPUAAAjYRSC7X1y9eHhufSG4so2ZFACGcFT5uhgAEIJAmAnvaXjh77sJUk3d0dTc/3Wqs9dZI47vbycnIVLcTdxNACN008CEAAQjkBoHhcxcWBOctLy0OBOaMXrqks+PYxLguHes7dmIyghAmVc20CmFSO2AQBCAAAQjMgEBZybLqUJXUbt1Da3R7sKBAl44pgiVJACFMEhTDIAABCHiRgI6G5r9ZMXDq9Np1jzrmxb16dU8IoVcrk3P7SsWG349E/vqVkz89euRn/cffeetKKqZkDgjkMIG+cL/7nZ/eBUYl0xvulwqa/35hWUnx/s52x6JGcpmAAEKYAA5dGSVw7U+OnX+4+vK3dv34v7S/8YfPvvboxre69qV1B9euXUvr/EwOgVkSCN56C+i89lteUhw1YV/4RGX5Kic4NjbumBPEmZYAQjgtIgZkgsB7kciP97bfePdd92JjPd+JnD/njqTQD4fDZ86cSeGEVk1FspkhYN4COq/9KlavjFq3qaGute1589Ohw+dHOl7sbt79nEyOToqB/Pyo8VzGJYAQxsVC8K4J/Ox/9s/Gxv7H/vff+WXsqle///3ZTKt7Y+e8fv16d3e3hDC2iwgEcotAWekyCV7vsX5tu+LBVfs72ms3PVIdWiNHdvbciOLYtAQQwmkRMSApAn/96qnZ2M9fi/9v7N/9xRuzmVb3Ru1eKtjW1sZZMAoLl7lLQKfGYZfgFS1ZdODQEZ0RdS4cOHU6FXn5fw6E0P81zkyGi1qenY0V/P6X4u7zH6383dlMq3ujpu3s7Lxy5faP4ehQ2Mgn/QRaWloet/hz9OjRqC/CVF1ORiJj4xORSESOM6fOiNLCtV98VM9FW3Y95cRxEhBACBPAoStzBPI/98/iLjbnc5+LG59xsK6ubuHCheb2UCgkXcTSTaC5ufkliz/r1683X2+pb2/k1dQ2nBwc2rF9i3vyxoavSAXXharcQfwEBBDCD+DgZZHAx4PBBY/XRW1gbuXv/dbqiqjgLC/nzp27c+fOysrKWc7D7RBIOQEd79ym+XXac0eMr7ixQCB/6Ec/1LvAoiWLFdG5sOfQkfrG7aOXLjc1fHnrk7tGL15WHJuWAEI4LSIGZIhA8GsN/2Tvfws+WvuR3/zNeQ99YcnuZz7zrafTtHZNTY2Og2manGkhMAMCErn6xm1uGxsf7zn4x+6I8Yfj/QHS4fMjMilfU0Nd7cYN1aE1amtqNw+7Xh/OYFeW3IIQWlLo3Egz//OfX/CvGn/jk3MKvvrVf1i+Op2bzguFQitWrEjrEkwOgeQJBOcXHO89mIyVld78bcLqUFUwOM+Zf8e2LTod6qWg6VX8sU0bjvd+t6x0mXwsMQGEMDEfev1MQI9J/ZweufmagF4BSjudFPWY1PEdxz3ACeLEEkAIY5kQgQAE/EWAbCCQkABCmBAPnRCAAAQg4HcCCKHfK0x+EIAABGwiMINcEcIZQOMWCEAAAhDwDwGE0D+1JBMIQAACEJgBAYRwBtC8cYuPd3GPj3MjNQhAwHMEEELPlYQN5d2AAQQgAIHMEUAIM8ealZIlwIkwhhQBCEAgfQQQwvSxZeaZEuBEOFNy3AcBCMyAAEI4A2jckmYCnAjTDJjpvU2A3WWaAEKYaeKsNz0BToTTM2IEBCCQMgIIYcpQMlHKCHAiTBlKJoIABKYnkE0hnH53jLCTACdCO+tO1hDIEgGEMEvgWTYBAU6ECeDQBQEIpJoAQphqoswXn8DdRDkR3g0txkIAArMkgBDOEiC3p4EAJ8I0QGVKCEBgKgII4VRkiGePACfC7LFPzcrMAoGcIoAQ5lS52CwEIAABCKSaAEKYaqLMBwEIQMAmAj7IFSH0QRFJAQIQgAAEZk4AIZw5O+6EAAQgAAEfEEAIky4iAyEAAQhAwI8EEEI/VpWcIAABCEAgaQIIYdKoGGgTAXKFAATsIYAQ2lNrMoUABCAAgTgEEMI4UAhBAAI2ESBX2wkghLZ/BZA/BCAAAcsJIISWfwGQPgQgAAGbCMTLFSGMR4UYBCAAAQhYQwAhtKbUJAoBCEAAAvEIIITxqPghRg4QgAAEIJAUAYQwKUwMggAEIAABvxLwjxDew8cvBPQvm19SyVQe3l5HBcUg4GUC/hHCK3z8QuDX7/16fGzcL9mQxxUvfwdkbxAQAf8IYSEfvxC49957F336M37JhjwK9Y0GSyEBpko5Af8IYcrRMCEEIAABCNhAACG0ocrkCAEIQAACUxLwsBBOuWc6IAABCEAAAikjgBCmDCUTQQACEIBALhJACHOxaj7cMylBAAIQyBYBhDBb5FkXAhCAAAQ8QQAh9EQZ2AQEbCJArhDwFgGE0Fv1RB8r6AAAEABJREFUYDcQgAAEIJBhAghhhoGzHAQgAAGbCORCrghhLlSJPUIAAhCAQNoIIIRpQ8vEEIAABCCQCwQQwlRViXkgAAEIQCAnCSCEOVk2Ng0BCEAAAqkigBCmiiTz2ESAXCEAAR8RQAh9VExSgQAEIACBuyeAEN49M+6AAARsIkCuvieAEPq+xCQIAQhAAAKJCCCEiejQBwEIQAACvifgEkLf50qCEIAABCAAgRgCCGEMEgIQgAAEIGATAYTQpmq7csWFAAQgAAFDACE0HGghAAEIQMBSAgihpYUnbZsIkCsEIJCIAEKYiA59EIAABCDgewIIoe9LTIIQgIBNBMj17gkghHfPjDsgAAEIQMBHBBBCHxWTVCAAAQhA4O4J5K4Q3n2u3AEBCEAAAhCIIYAQxiAhAAEIQAACNhFACG2qdu7mys4hAAEIpI0AQpg2tEwMAQhAAAK5QAAhzIUqsUcI2ESAXCGQYQIIYYaBsxwEIAABCHiLAELorXqwGwhAAAI2EfBErgihJ8rAJiAAAQhAIFsEEMJskWddCEAAAhDwBAGEMENlYBkIQAACEPAmAYTQm3VhVxCAAAQgkCECCGGGQLOMTQTIFQIQyCUCCGEuVYu9QgACEIBAygkghClHyoQQgIBNBMg19wkghLlfQzKAAAQgAIFZEEAIZwGPWyEAAQhAIPcJJC+EuZ8rGUAAAhCAAARiCCCEMUgIQAACEICATQQQQpuqnXyujIQABCBgDQGE0JpSkygEIAABCMQjgBDGo0IMAjYRIFcIWE4AIbT8C4D0IQABCNhOACG0/SuA/CEAAZsIkGscAghhHCiEIAABCEDAHgIIoT21JlMIQAACEIhDwLdCGCdXQhCAAARyh8DrFy+PjU9kcb/u1ScnI7IEmxk+N+Lu1b2Jx7sHZ91HCLNeAjYAAQhAIA6BPW0vnD13IU7HrZCURtqT2EYvXr419kNNfeN23auQ2rXrHpUT1zRzfeM2p+vAoSOtbc87l47TG+7vOXREUzXvblVQY3SjnOanW4fPT7l5DfCUIYSeKgebmRkB7oKAdQQ6urplvcf6ZVt37DK+2ubdzykik99z8EgUF+nZ2Ph4cH6B4mqD8+dJxuTHmmYoK13mxMtKigdOnXYuHaeyfOWBO6tIDgcGh8xdk5FIMDjPGeZxByH0eIHYXhoJvP3222mcnakhkGYC1aE1LbuekhUtWWx8tWUlyxSRyY9aX0KlE9v+znYn3tRQ19nVrbgTMY4ifeF+9erIaEwHPj3qNL5pzciTg0PVoaqTrwxJ+c6eG3ls0yM6EWqkDqM1j20ufqDSmBns2RYh9Gxp2Fh6CRw9evTUqVPpXYPZ00GAOWdEQNqmh6KNDXU6CDoT6PRWu3GD4up1gnJ0mqwoX6mRElTH9ne0O74cDZPp+afu7XyxW760U/o3Nn5VbVlp8fHeg7JAIP9wT5d6vWwIoZerk0t7+5uzf5oqu/GrX/3tayOpmu3diatRHK9fv75v376XX345Ks4lBLxGQNKil22Otba9MLMd6qwmtdPRTYe8qBkkjetCVeodu/ODOXp8qnUD+fkaWVS46Oz5EamaY3rgqVOgLtUrkyIumF9QUb5KvkznUa2ix6o6FOq5q0yOJlGXlw0h9HJ1cmlvf/7Nb/zlC8+nxD7y8Y9feak7JVP9+c5vREGUCj777LOvvvpqVJxLCHiQgFRneWnxB1ZSPNUmPxW4qVtxe3U+27ajWRK4vKRYzy1jTe//pIU1tQ0SLQ3W41OJmZlK5znpoqORCp49d8G51PitTza/NT6hySWce/e0DJ8fUe/wzTFX1avZZqmCWjEDhhBmALItS3zuP7WV9Xx39va7/+vkA4e+P/t5NEMs+r1797755psmrkPh43zST6ClpSX9i3h3BT2EN19vM2jNAUuyZKxi9cq4k4xeupxAb4oKFx//wUGd4Tpe7DZW37TdOGpb25+X6Vw49KMfSvakXjrk6aGos5CekZ4c/ODHZCR1eqBqejV+eenNV5K6S8N0qXvHxia0VvVDVToXvjV+1T2VucuDLULowaKwpTQS2Lx58/33328WePjhh1/ik34Czc3N6V/EuyusX7/efL2lqe3s6pYKOnozGYnoTCZzLyeJ0gC95JM1fa1O4+UY27Fti3qdwWWlyyS6zqWcpYWLR0cvyzE2evGNpUsWG19tfv6c3vCJ5lu/OyFHFgwWaM7K8pUDrwwNDA5V3nlqqsGeNYTQG6VhF5kicN999z3zzDNVVVWZWpB1IJBGAnqMKe3ROcyssbRwkSSwvnGbzERiWx3UdNB04jq0OX5cRw9Oh+/8RqAml+nA54xUl14oTuoTicjf0/6ClFLKKkHVmIHB0+7BinjTEEJv1oVdpZdAbW2tjoPpXYPZIZB+Anp9uL+zXac9s5Segh7u6Tp+68c1TSSqlYxJt2o3bXDikUgkWHDz1wqdSJSjOXXK1MNPxfWCULooxzFpcCAQaGr4ipxbbwrnVJTffn4rLQzOn+fszbnFgw5C6MGisKVMENADqwcffDATK8WsQQACqSJQUb4qeaWRmG39+q7HNm5w3/LW2IQeZibeT9GSRTrtacxAzKNOKetkZLL31t+XqW/cvmPbExomU1yKK6cvfEKtxw0h9HiB2F4aCegxaRpnZ2oI3CUBiYfbdLeOa+6I8RU31hfud36zItZRrxlmWt0rocrLu6Fzm4mYdmDwtN4CGj+qlXAOnxuRVYfWBPLnnLz5zu/0jbw8RWSaUOOlqXtbWwKBOZ1d39bB8fVLN/8+qm6U4upd4949LR1d3xmN95fedK93DCH0Ti3YCQQg4D8CyWYkXdGLPbeNjY/3HPxjd8T4w3f+AGn0L1e4f9GitFi9zto6sdXUbtbrOj1HDQTyDxw6YoTTSKPzMNMZbxxtqePWD5r2HTsh58D3jmgG4+tybOyqBmiGtV/cJCk9/oODh3v2KSIB1loVq1dKcSWTTQ1frm+6/ddNzbQebBFCDxaFLUEAAtYRkGaYd3vTtlIj0anduGHHti06dU1lTQ11TZvrNFKmV4k6t+3YfvsHRIsKFykiWxeqOhzvz74sLymuLF+pt4PmJ0vjttqG9qwZJIGNDXXSV13qTaF8tVpd68p0mtzf0aYu+Z41hNCzpWFjEIAABKYkIJWS9kzZnZcn7ZGZAXKkW8ZXW1Zy83ckjILGnaSsdFnF6tt/LEbjE5gmiZ3BvZbu1VbVetlSJoReTpK9QQACEIAABKYigBBORYY4BCAAAQhYQQAhtKLMqU6S+SAAAQj4hwBC6J9akgkEIAABCMyAAEI4A2jcAgGbCJArBPxOACH0e4XJDwIQgAAEEhJACBPioRMCEICATQTszBUhtLPuZA0BCEAAArcJIIS3QfAPCEAAAhCwk4CtQmhntckaAhCAAARiCCCEMUgIQAACEICATQQQQpuqbWuu5A0BCEAgAQGEMAEcuiAAAQhAwP8EEEL/15gMIWATAXKFwF0TQAjvGhk3QAACEICAnwgghH6qJrlAAAIQsIlAinJFCFMEkmkgAAEIQCA3CSCEuVk3dg0BCEAAAikigBCmCGR6p2F2CEAAAhBIFwGEMF1kmRcCEIAABHKCAEKYE2VikzYRIFcIQCCzBBDCzPJmNQhAAAIQ8BgBhNBjBWE7EICATQTI1QsEEEIvVIE9QAACEIBA1ggghFlDz8IQgAAEIOAFApkSQi/kyh4gAAEIQAACMQQQwhgkBCAAAQh4g0Bv+MS0Gxkbn3DGTE5GZM5lrDN8bsQd1L2Jx7sH+9hHCH1c3KylxsIQgEBKCOza3Zp4HglbfeM2Z8yBQ0da2553Lh2nN9zfc+iIZK/51oQaoxvV2/x06/D5C3IsN/8I4ft8skrAs/8ivXfjRlbBsPj7nv3a8MHGeo/1l5UucxIpKykeOHXauXScyvKVBw4eMZeSw4HBIXPXZCQSDM4zcZtb/wjhZT5ZJfD+e178ficR/PGP/yKrYHy/+PQJ2vwd9q5yP3vugo5obtPt7kv5J1/5QOckaX3h/qaGurXrHjWmA58edRrftJpBdnJwqDpUdfKVISnf2XMjj216RCdCjRy9eLnmsc3FD1Qa00g7zT9CuIBPVgl85KNe/Fq65557CuYXZBUMiy+w83vrDLKWsI1NjC8vLXZMkzi+HPWOXrqsoLGOru6K8pXB+QUtu55ybH9Hu+PLMSP1/FOTd77YrUtpp/RvbPyq2rLS4uO9B2WBQP7hni712mle/OZlZyXIOvMErl27lvlFWRECCQgECwp0dHNMIx1fjjRPEWN6HShJC+Tn67KocNHZ8yNSNcf0wFOnQF2qVyZFXDC/oKJ8lXxZWckyzabHqjoUBufPk8nRJOqaxnzajRD6tLCkNR2BcDh85syZ6UbRDwEvEtBhrrXteYmZ2ZzOc9LFMdePj549d8G5lMhtfbL5rfEJPUSVcO7d0zJ8fkS9wzfHXFWvZrNZBcUQIRQEzC4C169f7+7ulhDalTbZ+oiA1EuHPPcBUc9ITw5+8PpQUmd+HEZJSyaXly7TeN2lYbqUPzY2ofNi9UNVOhe+NX7VPZVusc0QwrgVJ+hbAlLBtrY2zoK+LXCOJ6bDmd78OaZsHF+OehWRSeSc46AuZUsLF4+OfvD6cPTiG0uXLFbcWH7+nN7wieZbvzshRxYMFuzYtqWyfOXAK0MDg0OVd56amvG2tQihbRVPV77v/eIXVw4e+MuuP/KUvf/LX0Yl3NnZeeXKFRPUobCRT/oJtLS0PG7x5+jRo+brLa1tWUnx8J3fCNRjT5kOfM6K6uoL90/qE4nI39P+gpRSR0MJqsYMDJ52D1bENkMIbat42vJ9//13r1375U9/6il77/3oX+qoq6tbuHChoRAKhaSLnfwvzQSam5tfsvizfv168/U2bbu8tHjHv9uiN3mO6RbHl7Nj+5bK1SsVjLWiwsWTkdt/VkYvCKWL7jF6FhoIBJoaviJH8wTy51SU356nrHRZ8ObPyxS4x9vmI4S2VTxd+X50zpzPPPFvi77xTU/ZvZ/8ZFTCc+fO3blzZ2VlZVScSwhknYBe1JWVfPDb8bH7UW+R64Fn1ICiJYt02lNwIOZRpw6Ik5HJ3lt/X6a+cfuObU9omExxnRTl9CXxt9w0zK+GEPq1suSViEBNTY2Og4lG0OdTAv5La3IyMnxuRFYdWqOj3smb7/xO38jLU0QmqVPKkti9rS2BwJzOrm/r4Pj6pcuK68atX9+ld41797R0dH1n9OJljbTTEEI7607WeaFQaMWKFYCAQK4TkKR1vNgt6zt2Qu2B7x3RCz/j63Js7KoG6BS49oublhYuPv6Dg4d79inSF+6vqd1csXqlnpRKJpsavlzftF3xXKcxs/0jhDPjxl1+IKDHpH5Igxz8S0DHtQTJLS8prixfqbeD+zvapzKJonRuXahKEtjYUBcI5OtSbwrlq21qqDPzV4fW7O9oU5e59EubbB4IYbYOQ/YAABAASURBVLKkGAcBCEAgwwSkVQlWLCtdVrH69h+LSTBMXRJUSaAct0kj3ZcSVPelVT5CaFW5SRYCEIAABKIJIITRRHLxmj1DAAIQgMCMCSCEM0bHjRCAAAQg4AcCCKEfqkgONhEgVwhAIMUEEMIUA2U6CEAAAhDILQIIYW7Vi91CAAI2ESDXjBBACDOCmUUgAAEIQMCrBBBCr1aGfUEAAhDIKgH3H5qZnLz9F72n2tHwuRF3l+7VLe6Il32PCKGXEbE3CEAAAmknINno6OpOxjRSu5HSSHsSW9w/H1rfePtPqWmGtese1VRxTTPXN25zug4cOtLa9rxz6TjmD3lrKvMfO9QY3aje5qdbzV8Al+99Qwi9XyN2CAEIWEFAcuK2gVeG+sIn3BHjGxZGMnuP9cu27tilSzlqm3c/J0cmv+fgETPYaaVnY+Pj5k+pqQ3On9dzKHqMGawZyko/+E9hlJUUD5w6bbrcbWX5ygN3VtH2BgaHzF2TkUgwOM890ss+Qujl6vh0b55J69q1a57ZCxuxnUAgkN+y6ym3VaxeVVayzB0xvkYaWNWhNSZStGSx8dU6t8g3w5xWQqUT2/7OdifS1FDX2dWtuBMxjiJ94X716shoTAc+nUSNb1oz8uTgUHWo6uQrQ1K+s+dGHtv0iE6EGqnDaM1jm4sfqDRmBnu2RQg9Wxo2ll4C4XD4zJkz6V2D2SHgGQLSNj0UbWyo00HQ2ZROb7UbNyiuXicoR6fJivKVGmmE1rT7O9qNY1oNk+n5p+7tfLFbfl+4X/o3Nn5VbVlp8fHegzLJ9uGeLvV62RBCL1eHvaWFwPXr17u7u8PhcFpmZ9IPE+BqNgT6jvU3P/3c2PjEbCbRvTqrSe10dNMhT5dukzSuC1Wp11lFj0/7wv2B/HwNKypcdPb8iFTNMT3w1ClQl+qVSRQXzC+oKL/9t791HtUqeqyqQ6Geu8rkaBKN9LIhhF6uDntLPQGpYFtbG2fB1JNlxjQQqH6oakGwoKZ287Ry+KnATd2KuwWdz7btaJYELi8p1nPLWNP7P2lhTW2DREuD9fhUYmam0nlOuuhopIJnz11wLjV+65PNb41PaHIJ5949LcPnR9Q7fHPMVfVqtqLCRbrL44YQerxAubS9Nw+8dOk/75m9XdzznGz282iGWHydnZ1Xrlwx8XA43Mgn/QRaWloet/hz9OhR8/U2s1YntsM9+/LuydOhrTfcH3eS0UuXE+hN0a3/Hq/OcB23/vu9auubtqs11tr+vEyrDP3oh5I9qZcOeXoo6iykZ6QnBz/4MRlJnR6oml6NX1568y2m7tKwQODma86xsQmtJQnXufCt8avuqcxdHmwRQg8WJSe3dN9DX/hV3j0psZ8cOfLuO++kZKr7QtVRNOvq6hYuXGiCoVBIuoilm0Bzc/NLFn/Wr19vvt5m3EpLJE46dfUdu/lDpFHzdHZ1SwU1xsQnIxGdyWTm0rRSLA3QSz5Z09fqNF6OsR3btqjXDFNbVrrMOQ7qUra0cPHo6GU5xkYvvrF0yWLjq83Pn9MbPtG8u1W+HFkwWKA5K8tXDrwyNDA4VHnnqakGeNYQQs+WJsc2Fqz/WqrsNz7xid/50sZUzfax3/4dN8q5c+fu3LmzsrLSHcSHgPcJSJ8kXdIz91b1GFPaI5k0waWFiySB9Y3bZCYS2+qgpjd5TlyHNseP6+jB6fD5C6ZLk8t04DOXatXVF+6f1CcSkb+n/QUppZRVgqregcHT7sGKeNMQwpnUhXtynUBNTY2Og7meBfuHwPLS4v2d7Y466ino4Z6u47d+XDMuHMmYdKt20wanNxKJBAsKnMtYR3PqlKmHn+rSC0LpohzHpMGBQKCp4StydGYN5M+pKF9peqWFwfnznL2ZoDdbhNCbdWFXaScgIVyxYkXal2EBCKSTQEX5quSVRmK29eu7Htu4wX3LW2MTepiZeI9FSxbptKcxAzGPOqWsk5HJ3nB/z6Ej9Y3bd2x7QsNkiktx5fSFT6j1uCGEHi8Q20sjAT0mTWJ2hkDAowT6wv3NT7dOZep171vKJKHKy7uhc5s7PjB4Wm8B3RHHl3AOnxuRVYfW6Kh38uY7v9M38vIUkWlCjZSm7m1tCQTmdHZ9WwfH1y9dVlw3SnH1LHfvnpaOru+MXryskV42hNDL1WFvEIAABKYkEAzO06PRqUy9zp294f6a2s16XafnqIFA/oFDR4x8Gml0HmY6440jSTM/Wdp37IScA987ohmMr8uxsasaoBnWfnGTpPT4Dw4e7tmniARYa1WsXinFlUw2NXy5vun2Xzc103qwRQg9WBS2BAEIZImAl5ZdXlJceed9W+y+ajdu2LFti05dU1lTQ13T5jpzo8RS57Yd22//gGhR4SJFZOtCVYfj/dkXs7TeDu7vaJ/KJIrSOc0gCWxsqJO+6lJvCuWr1epmaZ0m93e0qctcerNFCL1ZF3YFAQjYTqCsdFnF6tt/sSWWhVRK2hMbdyLSHpm5lCPdMr7aspKbvyNhFDTuJImX1gyOaZLYGdxraaS2qtbLhhB6uTrsDQIQgAAE0kXAmRchdFDgQAACEICAjQQQQhurTs4QgAAEIOAQQAgdFP51yAwCEIAABKYmgBBOzYYeCEAAAhCwgABCaEGRSdEmAuQKAQjcLQGE8G6JMR4CEIAABHxFACH0VTlJBgIQsIkAuaaGAEKYGo7MAgEIQAACOUoAIczRwrFtCEAAAhBIDYHcEMLU5MosEIAABCAAgRgCCGEMEgIQgAAEIGATAYTQpmrnRq7sEgIQgEBGCSCEGcXNYhCAAAQg4DUCCKHXKsJ+IGATAXKFgAcIIIQeKAJbgAAEIACB7BFACLPHnpUhAAEI2ETAs7kihJ4tDRuDAAQgAIFMEEAIM0GZNSAAAQhAwLMEEMI0lIYpIQABCEAgdwgghLlTK3YKAQhAAAJpIIAQpgEqU9pEgFwhAIFcJ4AQ5noF2T8EIAABCMyKAEI4K3zcDAEI2ESAXP1JACH0Z13JCgIQgAAEkiSAECYJimEQgAAEIOBPAvGF0J+5khUEIAABCEAghgBCGIOEAAQgAAEI2EQAIbSp2vFzJQoBCEDAagIIodXlJ3kIQAACEEAI+RqAgE0EyBUCEIghgBDGICEAAQhAAAI2EUAIbao2uUIAAjYRINckCSCESYJiGAQgAAEI+JMAQujPupIVBCAAAQgkScAXQphkrgyDAAQgAAEIxBBACGOQEIAABCAAAZsIIIQ2VdsXuZIEBCAAgdQSQAhTy5PZIAABCEAgxwgghDlWMLYLAZsIkCsEMkEAIcwEZdaAAAQgAAHPEkAIPVsaNgYBCEDAJgLZyxUhzB57VoYABCAAAQ8QQAg9UAS2AAEIQAAC2SOAEGaePStCAAIQgICHCCCEHioGW4EABCAAgcwTQAgzz5wVbSJArhCAgOcJIISeLxEbhAAEIACBdBJACNNJl7khAAGbCJBrjhJACHO0cGwbAhCAAARSQwAhTA1HZoEABCAAgRwlMCMhzNFc2TYEIAABCEAghgBCGIOEAAQgAAEI2EQAIbSp2jPKlZsgAAEI+JsAQujv+pIdBCAAAQhMQwAhnAYQ3RCwiQC5QsBGAgihjVUnZwhAAAIQcAgghA4KHAhAAAI2ESDXOwQQwjsk+CcEIAABCFhJACG0suwkDQEIQAACdwjYIIR3cuWfEIAABCAAgRgCCGEMEgIQgAAEIGATAf8I4Vt8/ELg1+/9emJ8YobZcJv3CNj0HZVcc5KAf4SwkI9fCNx7772LPv0Zv2RDHoU5+a0x45uenIx0dHUnYxqp3Wlk89OtxlrbXjh77oLx3a0ZqcFYYgL+EcLEedILAQj4j4DPMhobn3DbwCtDfeET7ojxTdbD5y4sCM5bXlocCMwZvXTpZtfEuC4d6zt2YjISMYNpExNACBPzoRcCEIBAJggEAvktu55yW8XqVWUly9wR42uk2ZB6q0NVUrt1D61RJFhQoEvHFMGSJIAQJgmKYRCAAAS8SEBHw7LSYu1s4NTptesedUwRf1kas0EI0wiXqT1O4O233/b4Dtme5QT6jvU3P/2cHntOxaE33C8VDM4v0ICykuL9ne2OKYIlSQAhTBIUw/xG4OjRo6dOnfJbVuTjLwLVD1UtCBbU1G6eSg71ErGyfJWT9NjYuGNOEGdaAgjhtIgyPYD10k3g+vXr+/bte/nll9O9EPNDYPYEGhvqDvfsy7snr75xu85/URM2NdS1tj1vfjp0+PxIx4vdzbufk8nRSTGQnx81nsu4BBDCuFgI3jWBP/uDf5Mq+9XPf/76f2hO1WxRmUgFn3322VdffTUqziUEPEtATz5bdj0lzes7dvOHSN37LCtdJsHrPdavYMWDq/Z3tNdueqQ6tEaO7Oy5EcWxaQkghNMiYkBSBAqq16XKPtvybPD3a1I1W9Tu9+7d++abb5qgDoWPZ/ljxfItLS1W5DlFknoIb77eZtlWh6qkbRLFqHnKSpYNuwSvaMmiA4eO6Iyoc+HAqdNRg7mMSwAhjIuF4F0TuO/3/oU3LSqTzZs333///Sb48MMPv8Qn/QSam5vTv4h3V1i/fr35ekt5OxmJjI1PRCIROc7kZaXLpIVrv/ionovqHOnEcRIQQAgTwKHLhwTuu+++Z555pqqqyoe5kZLHCaR8ezfyamobTg4O7di+xT13Y8NXpILrQnyRu6kk8hHCRHTo8yuB2tpaHQf9mh15WUIgEMgf+tEP9by0aMlipaxzYc+hI/WN20cvXW5q+PLWJ3eNXrysODYtAYRwWkQM8CcBPbB68MEH/ZkbWdlHYPj8iEzK19RQV7txQ3Vojdqa2s3u14f2UUk243QIYbJrMw4C2SWgx6TZ3QCrQyABgeUlxZXlK6caUB2qCgbnOb07tm3R6VAvBctu/ZUZxR/btOF473f1ylA+lpgAQpiYD70QgAAEskNAGlax+oNflo/ahF4Bun+CVI9Jowbo0j1Al9hUBBDCqcgQT44AoyAAAQjkOAGEMMcLyPYhAAEIQGB2BBDC2fHjbgjYRIBcIeBLAv4Rwot8IAABTxLw5bdOkvITAZ8IYSEfCEDAwwT89E3TmlwsStQnQmhRxUgVAhCAAARSSgAhTClOJoMABCAAgVwjgBDm5VrJ2C8EIAABCKSSAEKYSprMBQEIQAACOUcAIcy5krHh2RDgXghAAALRBBDCaCJcQwACEICAVQQQQqvKTbIQsIkAuUIgOQIIYXKcGAUBCEAAAj4lgBD6tLCkBQEIQMAmArPJFSGcDT3uhQAEIACBnCeAEOZ8CUkAAhCAAARmQwAhnA29bNzLmhCAAAQgkFICCGFKcTIZBCAAAQjkGgGEMNcqxn5tIkCuEIBABggghBmAzBIQgAAEIOBdAgihd2vDziAAAZsIkGvWCCCEWUPPwhCAAAQg4AUCCKEXqsAeIAABCEAgawSyIIRZy5WFIQABCEAAAjEEEMIYJAQgAAGDzxw7AAAAGklEQVQIQMAmAgihTdXOQq4sCQEIQMDrBP4eAAD//zrPBIAAAAAGSURBVAMA8ZeDNEH+i+QAAAAASUVORK5CYII="
    }
   },
   "cell_type": "markdown",
   "id": "97a2d88a",
   "metadata": {},
   "source": [
    "这里我们选择年收入来处理吧，我们选择四分位数范围 (IQR)，他是基于箱线图原理来定义异常值边界\n",
    "\n",
    "![image.png](attachment:image.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 272,
   "id": "7e9b3cb2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAoQAAAIhCAYAAADXZqsSAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjcsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvTLEjVAAAAAlwSFlzAAAPYQAAD2EBqD+naQAALtFJREFUeJzt3Qe4XGWdP/A3IUAIkNA7hN5lQbq6dKTDoqh0EQGX8gAqunQWCGUVEFR67wi4dFCqwIKAQqgRUAg19BKCBEjI/J/fu/9zdzK5ZZLcubmX9/N5nslkZs55z3vOuXfud95ypl+tVqslAACK1X9aVwAAgGlLIAQAKJxACABQOIEQAKBwAiEAQOEEQgCAwgmEAACFEwgBAAonEAIAFE4gBHqtF198MX3wwQdtj8ePH5/uueeeNGHChNRXfPHFF5M85wuigN5GIARa5q233kpnnXVWvp8SW2yxRbrmmmvaHj/00ENpgw02SGeccUbTZTz66KNpxIgRU7T9hx9+OF1xxRXp448/nuS1O+64I913332dhru///3vabHFFsshtt73v//9tOyyy+Z1n3322TR27NhJ1o3w++GHH072rS+FZaD3GDCtKwB8ef3mN79Jxx13XJp55pnTLrvsMtnrzzTTTGmGGWZoe3zLLbekIUOGpN12262p9T/77LO06667ptdeey1de+21aeONN2562xHI9tlnnzRq1Ki05ppr5sfR2jdgwIDUr1+/tOOOO6ZvfOMbaZ111umwjDPPPDPNMsssef16c845Zxo0aFAOhDvvvHNaZpll0uWXXz7RMg8++GBad9110+R6+umn0worrDDZ6wFl00IItMRzzz2XTj755LTeeuulrbbaqt3WrPfffz+98cYb6ZVXXsnrPPHEEzkgjR49Oj/u3///3qIijF166aVp9913zyGr/vloYYvw12jGGWfMwWq11VZLW2+99SQtdZ35j//4jzR8+PA0ZsyYtNRSS6WFFloozT///GnYsGHpmGOOSe+++266/fbb02yzzZZvs846a9ppp53a1o9W0XPOOSffIvjV1y9CbYTd2L8IgvF6YytkBM8Q24/X4xbH6+ijj07vvfde23PV7dVXX83L1wdogGYJhEC3i4D2ve99L40bNy499dRTackll8y3CFXROlY9XmKJJfJtlVVWyetF0ImANHDgwEnKvPXWW9Prr7+efvWrX+UWuuoWwSla20444YR26xLhK1oWV1555VynCFWdiYD5s5/9LG/nggsuSH/5y1/aWt4+//zzXNdTTjklXXzxxbk+Cy64YDr88MPTCy+8kH7961+3lRNlfOtb38qtiCeddFJuBazfdmwnWi6jW3nRRRfNXeE33XRT2+vTTz/9JHX7r//6r7yfsU59sK7vtp5uuum6ODsAk9JlDHSrCCd77bVXbu2L1rE999yz7bXf/va3ORy99NJL7a5btW5Fy157Yejf/u3fcjdsJbpGoxUyulYHDx7cYZ0iYJ5//vl5+eOPPz7XoSPR0hYtg7/73e9yy+Lf/va3tO222+au4whuMebv2GOPzV3R4cc//nHe53nmmaetjKhjBNgnn3wyvfPOO7nrPJa7+eab8/o33HBDHtcY9YmA+S//8i+5vOWXX77DQBjHM0JqhNK11lqr7fmvfOUr6fHHH297XN+qCtAsgRDoNhGMYtzdZZddlgNQfRhsXC4mP0Qr2aeffppb+Kou0vZEV29MKIlJJjHz+Gtf+1pufYwZyCuttFKeuNGVBx54ILcoxiSXQw89NM0xxxztLheh784778z/jxB5xBFH5DBYTVCpWgF/+ctfTrTeVVddle666678/5gIE62d0SIaAS3qePDBB6fNNtssl7/66qvnruCXX34516kS67Qnuqe/853v5H2ObURr4iOPPJI23HDDXEchEJha3kWAbhPhJsb4xTi3mMBRdelWtwMOOCCHoOjWjMfREhhduhH2OhKh8cADD8yTUj766KO2WcfRbRrBspkwGOEzWteinKjTeeed19T+RCtdfUtd/D/GNzaOhTzooIPypJPKqaee2tY6OO+88+ZZ0bHPMeYwWk2//e1v5/Wef/75dPXVV+ewGCExWhmr8FkvAmy8FqEzWiujhXObbbbJLZGxHsDU0kIIdKsIKHGrJnDUB6XGLuOqhbC9MYOVCFKHHHJIWn/99dMll1ySWx9DdOsuvfTSafbZZ++yThGkIkDGpWKiZS5aCX/60592ON4uunEjbEb4e/PNN3OXbgTJatvtqW/pixbBEAE4JtRUXbynnXZaeuyxx/LlbKIeMfs4lo3xjXvvvXe+jwDZKLqcY1JKHKuoW4xP/OpXvyoMAt1GCyHQEs1Mbohl4pI0XS27/fbb56AUrWPR8hYta3/4wx9ySGxmgku0sG233XZ5Ash+++2XRo4cma688soO14lZxDHWMEJhBM7oHo7HlfpJLXGLbuVGF110UW79O/HEE3OYiwklMQklZixHC+Jcc82VX49rGa699tp5VnaMB2w8FlH/qOu+++6bW0PjOESLaoTHFVdcMa/7k5/8JLdGAkwpgRDoNtEa+M9//jNPfGhWdUmWTz75JLeehZhgES1hMZGiXsxMjla1CGf//d//nWcNd+Woo47KY/BiUkqI1r5vfvObuVs7ttmeaNWL8Fh1b0cLXf3lXBov+RKTTOrFGMQf/OAHuYt7kUUWyS2ghx12WPrFL36RZyRvuummecJKXBInWgtjrGW0eL799tuT1CXWjfAbQTDGDUaAjtbEaB2M47XlllvmsYdzzz13k0ccoB01gG5y2223xfVPpui2zDLL1A4//PDasssuWzviiCNqDzzwQG3llVeuXXjhhRNt49xzz83Lr7jiirUJEyZ0Wp877rij1r9//9ovfvGLiZ5/6KGHchm77rpru+tFueuvv36+DR06tHbdddfVBg0aVDvvvPM6rP+6667btv5HH31UGzZsWO3mm2+ujRgxojZ69OhJtvGb3/ymNt9889VmmWWW2o477lgbP378RK8PHz48lztmzJi258aNG1d76623ascee2xtnnnmqR133HFtr7366qt5+ZEjR3Z6TADao4UQ6DYbbbRRbhWLbs5oLWxsSYuWtqFDh070XEz4iNmz0doVrXlxOZboso2ZxO11Jce1+0LMEo71OhLXDYyZudEaGF2q9aKVMVrwYkxiddmYetGaGGMOqwtNx+VuYjJLjNtrpoUwLlIdLYIxqzgmz8T1BaOrOrqFKzEeMMYnxhjCuOB27GtX33EcxzcusB1l3n333XmSDEB3EAiBbhNBJcJQdHM2e4Hk+otLV18L154ImBHsInzFt4XExJRNNtmk7Rs6Gi8xE9+QssACC+TrCbZXl7h8THTbxni+CFlVOTE+L8YcxgSQ+m7izTffvNNL49SHuRhzuNxyy+V9inGBZ599dv5/jEeMsYQxYzrGHcZ9XPj6r3/9a14vQvCNN97YNuGm0bnnnpsvdh0X747xg1U3eD3fZQxMCbOMgR4TIaerVrDG5eMWY/JiVnCM6Yuxg3HJlQhTcR+TNGLSSUy6iIkW0coYgWzxxRdPt912W4cXrI6vv4vLwMS1/OKC0fH1eRHMYkZwTAiJ4BX39cGsmjHdXmiNMFeJS+784x//yDOqo/wIyDE+MC59E5eKqSaGxHjGeC1aMSPcxnNVfattxXUXI0xW2416xS2WjcAb2wnR2li/HsDkEAiBHhNdvO21fHUkulVjRnCEqJiVG9/wERd2DjFZIyZZREvfFVdckZ+P6wRGIIqWvwh58TV5nYkWu/vvvz/98Ic/zNcCrK45WH0LSUx2qf8O4ur/jaE2Wiyvv/76tscRTKNVsF60nMYkkug6ju7k+C7jEPsWITBaKn/0ox/lls1QHacIjZNDIASmRL8YSDhFawJMpvge3viquZj1OzliNnDVStZM6Ixbs8t3Jq4VGBeQXnjhhVOrNV6PMS5HE1/JF3WI1syuxNjKqGfMzK6ugwjQLIEQoBeKiTlx3cIllliiw3GVAN1FIAQAKJxZxgAAhRMIAQAKJxACABRuii87Exc/HTVqVL6UggHPAAC9T0wViasVxHVL+/fv3/2BMMJgT1yKAQCAqRPfxrTQQgt1fyCMlsFqAx19EwAAANNOfL98NOBVua3bA2HVTRxhUCAEAOi9uhreZ1IJAEDhBEIAgMIJhAAAhRMIAQAKJxACABROIAQAKJxACABQOIEQAKBwAiEAQOEEQgCAwgmEAACFEwgBAAonEAIAFE4gBAAonEAIAFA4gRAAoHACIQBA4QRCAIDCCYQAAIUTCAEACicQAgAUTiAEACicQAgAUDiBEACgcAIhAEDhBEIAgMIJhAAAhRMIAQAKJxACABROIAQAKJxACABQOIEQAKBwAiEAQOEEQgCAwgmEAACFEwgBAAonEAIAFE4gBAAonEAIAFA4gRAAoHACIQBA4QRCAIDCDZjWFfiyeuutt9Lo0aOn2faHDBmS5p133mm2fQCg7xAIWxQGd95l1zTu88+mWR2mn2HGdNmllwiFAECXBMIWiJbBCINjF183TRg4JPUf+2GaaeR9aexi66QJM83W8u33/3R0Si/em+shEAIAXREIWyjC4ISZ5/q/xzPNNtFjAIDewKQSAIDCCYQAAIUTCAEACicQAgAUTiAEACicQAgAUDiBEACgcAIhAEDhBEIAgMIJhAAAhRMIAQAKJxACABROIAQAKJxACABQOIEQAKBwAiEAQOEEQgCAwgmEAACFEwgBAAonEAIAFE4gBAAonEAIAFA4gRAAoHACIQBA4QRCAIDCCYQAAIUTCAEACicQAgAUTiAEACicQAgAUDiBEACgcAIhAEDhBEIAgMIJhAAAhRMIAQAKJxACABROIAQAKJxACABQOIEQAKBwAiEAQOEEQgCAwgmEAACFEwgBAAonEAIAFE4gBAAonEAIAFA4gRAAoHACIQBA4QRCAIDCCYQAAIUTCAEACicQAgAUTiAEACicQAgAUDiBEACgcAIhAEDhBEIAgMIJhAAAhRMIAQAKJxACABROIAQAKJxACABQOIEQAKBwAiEAQOEEQgCAwgmEAACFEwgBAAonEAIAFE4gBAAonEAIAFA4gRAAoHACIQBA4QRCAIDCCYQAAIUTCAEACicQAgAUTiAEACicQAgAUDiBEACgcAIhAEDhBEIAgMIJhAAAhRMIAQAKJxACABROIAQAKJxACABQOIEQAKBwAiEAQOH6TCD89NNP0/PPP5/v+XJzrgGgZ/WZQPjKK6+kvfbaK9/z5eZcA0DP6jOBEACA1hAIAQAKJxACABROIAQAKJxACABQOIEQAKBwAiEAQOEEQgCAwgmEAACFEwgBAAonEAIAFE4gBAAonEAIAFA4gRAAoHACIQBA4QRCAIDCCYQAAIUTCAEACicQAgAUTiAEACicQAgAUDiBEACgcAIhAEDhBEIAgMIJhAAAhRMIAQAKJxACABROIAQAKJxACABQOIEQAKBwAiEAQOEEQgCAwgmEAACFEwgBAAonEAIAFE4gBAAonEAIAFA4gRAAoHACIQBA4QRCAIDCCYQAAIUTCAEACicQAgAUTiAEACicQAgAUDiBEACgcAIhAEDhBEIAgMIJhAAAhRMIAQAKJxACABROIAQAKJxACABQOIEQAKBwAiEAQOEEQgCAwgmEAACFEwgBAAonEAIAFE4gBAAonEAIAFA4gRAAoHACIQBA4QRCAIDCCYQAAIUTCAEACicQAgAUTiAEACicQAgAUDiBEACgcAIhAEDhBEIAgMIJhAAAhRMIAQAKJxACABROIAQAKJxACABQOIEQAKBwAiEAQOEEQgCAwgmEAACFEwgBAAonEAIAFE4gBAAonEAIAFA4gRAAoHACIQBA4QZM6wpAo/fffz/f77XXXt1e9oABA9L48eObXn7mmWfOy3/22Wdtz0033XSpVqulCRMmTLRsv3790qyzzpqWXnrp9N5776XXX389Pz/nnHOmJZZYIr3wwgt536KsWDYMGjQol9W/f/80bty4XHaUG8/Hequsskr69NNPc3mxXpTzySef5MfxfNTtpZdeSh9//HEuY/bZZ8+3V199NS8fZc8000xpxhlnzP+P8qPszz//PD8X24j6jh07Nq8/77zz5rq9+eabuX5LLbVU+vvf/962r7H+H//4x1zXGWaYIS233HL5tdlmm61tn7744ou877G92Mbiiy+eFllkkbTlllump59+Ot1+++15H+aYY45clzgmUY94Luqw4IILpm222SaXdcMNN6RRo0al+eabLy266KLpySefzHX74IMP8n4tv/zyebsjRoxIAwcOzMcnyv3oo49yneaaa6600kor5XpXdYsyYpuxXLwW4rl33303ffjhh3m9eC3E4/j/CiuskJ555pmJlmksu17jdmL9J554Ih+7N954Ix+XOE/zzz9/PsexHzfeeGN66qmn8n4sueSS+fWO9qMj7e1frNPZfjcu24z4+anOzQILLJDPV/w81L9W/fzHz8g888wzWeU3s0/dsW5Xx6uz8x3LPPbYY/nnOX5/vvKVr6Rtt9227Th0VqfHH38838LKK6+cb2Fy9nNqjksz2vsZjt+BeBzHo/73o9nfg+6sY6v3v1V6c7371eJdewrEG9WQIUPS6NGj0+DBg1OrPf/88zkgnHPOOfkPWG9W1fWfy2+dJsw8V+r/z3fTzCNubHvcatX2+sKxarTxxhvnsEHZqnA5hW9PE4kwuc8+++T/n3HGGW1hN9T/YetMFRI6KnudddZpe+6+++6bZDuxP1O7L+1tq15724111ltvvfSnP/2py/3uqvzKWWedla655pqJjkccn+985zv5/42vTW75zexTM+V0te7kHK/G9cOJJ56YP8TUiw803/3ud9O///u/d1inU045ZZKft/hgFEGy2fMxNcelGe2V39HvQEfbbmUdW73/rTKt6t1sXtNlTK8hDJarCoCVCE9xixa/9lp54w9oo8bnonUzyo03wqOOOiodeeSRubXy9NNPT7feemvac8898x/g6o/wmmuumbbeeuu29aNVsxKtdlWZBx10UF42VGXHG32I+3hcbeewww5r25960ULYkWhN62w/qm3Va9xu7F/cxzpXXXVVvm9vv+P/1bKxbkfl14fBKC/+qMRx+P3vf5/v43E8H7fqWK222mppxx13bAufEZa6Kr+ZfWqmnl2tG/vR2fGKuoY4z9X5rs5B/BzFLcJg9Ajsv//+6fjjj88/q9FaHutH+R3VKY57tCaefPLJORzGz1mU1ez5mJrjMiXHPX6GY9/rg0TUP27xfNS5cdutrGOr979V+kK9tRC2gBbCyRddT/HHg2kvwkrVRR5dudE9PSWmn376/Ie1vru9Eq0h0U32yCOP5Mex3Oqrr5674OJDQfUHKN5fopxQPV+1trX31rXGGmvkLu7oUo5gFX+kR44cmbtnYr2bb745lxctHTvttFPuhh4+fHheN7o5d9ttt7TYYovlxy+++GL+Ix3bie7bqpv98ssvz2UdfvjheZlYPrrtL7744rTrrrvmN/hhw4bl9eJn+q233mrbxwgMa621Vn49/tA+/PDDkxyX2FaU+dxzz+X9iG78KDO2EfWN+8suu2yirsvYl2q7VZiJ52P7Ue8IabFOiGWrfawvK+oW+xTHq778SpSz2Wab5fMSrYARzCsRaDbffPP8/+hajfedqi4xrCFaD+NvRpzjl19+ud3y63W0T6Grena1bhz3+LmLn5XjjjtuouMV68XPa/zcRQisXq+2Gec7upFjn+I81R+HWObQQw/NZcfPx2233dbWfVyVHefzq1/9alu59ecojmGUGT9fHZ2PqTkuzWgsP36Gq8cRWqoPTPF7FOVX27vkkkvy6/H/xt+D7qxjq/e/VaZ1vbu9hTB+SaLQ+tu0EG8mEbh68y3q2Bv0hWNV3fbYY49pfbh6jSoATSv1Aa4+DHbWqtWeCHDthcEQfwAXXnjhid4UF1poobYW4vhDFGPrqnKq5zfaaKO8bEefY6PM3XffPf//7bffzn/0o3smthd1iTGMoRqLGH/04/m43XTTTfm5nXfeOd8iyMXzsW78/4c//GF+PdaNN/R4g4/HsY0YFxiBMh7H8/F6LFeFwWofQ2wzQkTsb3vHJdbZZZdd2vYjHlfbqO6j7Eq1L9V265+PdaOcap1q2Wof68uq9qmx/ErsX/xhi+NQHwbDLbfc0vb/CEz1dYllow6xbowP7aj8eh3tUzP17GrdOIZRl7hvPF6xXvRUxLmqf73+fFdjkGO5+uMQy8R5i3Wj/DhejWXHz1Mc96rc+nMUr1U/Xx3t59Qcl2Y0ll//OMbpVr8r8XtUv714XP2/8fegO+vY6v1vlb5S76YnlZxwwgnp6KOPTtNafLKiOY5V3xSfEHtj13l0j3UU8KZEhJ96jWVXXY/1YnzWHXfc0WmZa6+9dochtpqwVN3Xvx6t1KFqPWtUlVutWy1XldG4frVco2r5zo5llBEtg43rVPf1ZTfWp/H5xno3Ltve8+3Vvdq/+uPb+Fp75devU+1zR8emq31qpp5drdt4LBvXi0ky7f38NZYX4bajejUek66Off0x7ex8TM1xaUZj+fWP//znP3e6XLUPXf0eTU0dW73/rdJX6t10IDzkkEPST37yk7bH0UJY/wm/p0Rz/9ChQ1NvFi1zvSGM9YVjVTnwwAMnGaBdqo4Gbk9rY8aM6dbyGmdjNv6BjlnUja6++uouy6z/w9UYuqrZw9V9/etVEIium/ZU5VbrVstVZdSvHzMyq+UaVct31uIaZUT3ZOM61X192fX1ie02Pt9Y78Z9bO/59upe7V+UFzPG23utvpz6ulR1qPa5o2PT1T41U8+u1m08lo3rVYGm8eev8eeimkXd0TL1x6TxGFf1ajxH7S1b/9zUHJdmNJZf/7ijn7nqcfX/xt+D7qxjq/e/VfpKvZsOhPGLPLldRq0QAaevjIub1vrSsTrvvPOMIfz/pnXrYEdjCCe3dbCrMYRxaZxKLPfaa6/ldaqxgtXYvvoxhHfeeWdetqMxhFHm3Xffnf8fYwhjPFfM4qvGEK644or5tbjUQzwfY/iq97WtttoqT5KoxtpFC139GMLzzz8/rxPrRrdgjPWKx7GNuHxMXHol1o/nY5xQLBdlNI4hjG3GtmJ/2zsusa1LL700jyGs6lFto7qvLhtTvy/VdqsuqWr7F1xwwUTrxLLVPtY/X+1TY/mV2L+YLBHHYdNNN52ou3SLLbbIA+SrMYT1dYku1qhDtHxHiOqo/Hod7VMz9exq3TiGUZe4j32qP16xXrRAx3P1r9ef72oMYSwXY8XrxxDGeYvl42etunRSfdkxhjCOfTWGsP4cVZci6ux8TM1xaUZj+fWPY4xg9bsSv0f124vH8Xp7vwfdWcdW73+r9JV6m2VMrxCfKqf12Dk6H0M4uboaQ1hNKAkRuiIo1Y8hjAHQMQi7fgxhdc3GuAZhe7OMo8z4oxtiEPVDDz2U76u6xB+tuJZa/D9aueL1alxUvBbjxqK1Jm5VMI66xnUeo9zoNo/ZgdFj8uCDD7ZtY++9985hLi4fEevGIPFnn312orGx1RjCWD4mYDROKKmOS+xjLFPtRwwCj8fVfWyrfuB5/L9+u7F/0doe248ZvlFOrBuPY38ijFb7GEEunot1Yt14rrH8SuxfTA6J8uI+xlxGOIr7GANViWvyxbH5+c9/ns4+++y03XbbtU2OiX3uqPx6He1TM/Xsat04hlH/uG88XnE+o67xxzvWj/Mc+xf38Ther8YQxnJx3cEIPw888EA+11FmnOcov74FvKpTnN8o54ADDkiPPvpovhZh/JxFWdW56ex8TM1xaUZj+XFMYsxoPN5+++3bfleitzD2oWotjt+datuNvwfdWcdW73+r9JV6m2XcAmYZTzmXnqG7r0MYn7zjzTY0XgMsWmRiG1N6HcKq7J64DmF726rX3nZjnXXXXbep6xB2Vf7UXoew2fKb2admyulq3ck5Xo3rd/d1COMC+PGBuNnzMTXHpRXXIWz296C76tjq/W+VaVXvZvOaQNgCAuHUiU/ZBx98cEvK9k0lvqnEN5X4ppJmj5dvKvFNJd1tWtRbIJyGBMJyzjUA9Ga+qQQAgKYIhAAAhRMIAQAKJxACABROIAQAKJxACABQOIEQAKBwAiEAQOEEQgCAwgmEAACFEwgBAAonEAIAFE4gBAAonEAIAFA4gRAAoHACIQBA4QRCAIDCCYQAAIUTCAEACicQAgAUTiAEACicQAgAUDiBEACgcAIhAEDhBEIAgMIJhAAAhRMIAQAKJxACABROIAQAKJxACABQOIEQAKBwAiEAQOEEQgCAwgmEAACFEwgBAAonEAIAFE4gBAAonEAIAFA4gRAAoHACIQBA4QRCAIDCCYQAAIUTCAEACicQAgAUTiAEACicQAgAUDiBEACgcAIhAEDhBEIAgMIJhAAAhRMIAQAKJxACABROIAQAKJxACABQOIEQAKBwAiEAQOEEQgCAwgmEAACFEwgBAAonEAIAFE4gBAAonEAIAFA4gRAAoHACIQBA4QRCAIDCCYQAAIUTCAEACicQAgAUTiAEACicQAgAUDiBEACgcAIhAEDhBEIAgMIJhAAAhRMIAQAKJxACABROIAQAKJxACABQOIEQAKBwAiEAQOEEQgCAwgmEAACFEwgBAAonEAIAFE4gBAAonEAIAFC4PhMIF1lkkXTOOefke77cnGsA6FkDUh8xcODAtPTSS0/ratADnGsA6Fl9poUQAIDWEAgBAAonEAIAFE4gBAAonEAIAFA4gRAAoHACIQBA4QRCAIDCCYQAAIUTCAEACicQAgAUTiAEACicQAgAUDiBEACgcAIhAEDhBEIAgMIJhAAAhRMIAQAKJxACABROIAQAKJxACABQOIEQAKBwAiEAQOEEQgCAwgmEAACFEwgBAAonEAIAFE4gBAAonEAIAFA4gRAAoHACIQBA4QRCAIDCCYQAAIUTCAEACicQAgAUTiAEACicQAgAUDiBEACgcAIhAEDhBEIAgMIJhAAAhRMIAQAKJxACABROIAQAKJxACABQOIEQAKBwAiEAQOEEQgCAwgmEAACFEwgBAAonEAIAFE4gBAAonEAIAFA4gRAAoHACIQBA4QRCAIDCCYQAAIUTCAEACicQAgAUTiAEACicQAgAUDiBEACgcAIhAEDhBEIAgMIJhAAAhRMIAQAKJxACABROIAQAKJxACABQOIEQAKBwAiEAQOEEQgCAwgmEAACFEwgBAAonEAIAFE4gBAAonEAIAFA4gRAAoHACIQBA4QRCAIDCCYQAAIUTCAEACicQAgAUTiAEACicQAgAUDiBEACgcAIhAEDhBEIAgMINmNYV+DLr/+no/70f++FE9z21XQCAZgiELTBkyJA0/QwzpvTivRM9P9PI+3qsDrH9qAcAQFcEwhaYd95502WXXpJGj552LXURBqMeAABdEQhbJMKYQAYA9AUmlQAAFE4gBAAonEAIAFA4gRAAoHACIQBA4QRCAIDCCYQAAIUTCAEACicQAgAUTiAEACicQAgAUDiBEACgcAIhAEDhBEIAgMIJhAAAhRMIAQAKJxACABROIAQAKJxACABQOIEQAKBwAiEAQOEEQgCAwgmEAACFEwgBAAonEAIAFE4gBAAonEAIAFA4gRAAoHACIQBA4QRCAIDCCYQAAIUTCAEACicQAgAUTiAEACicQAgAUDiBEACgcAIhAEDhBEIAgMIJhAAAhRMIAQAKJxACABROIAQAKJxACABQuAFTumKtVsv3H330UXfWBwCAblLltCq3dXsgHDNmTL5feOGFp7QIAAB6QOS2IUOGdPh6v1pXkbEDEyZMSKNGjUqzzjpr6tev31Sn1wiWr776aho8ePBUlcW04Rz2fc5h3+b89X3OYd/3US88hxHzIgwusMACqX///t3fQhiFLrTQQqk7xcHrLQeQKeMc9n3OYd/m/PV9zmHfN7iXncPOWgYrJpUAABROIAQAKFyvCIQzzjhjOuqoo/I9fZNz2Pc5h32b89f3OYd934x9+BxO8aQSAAC+HHpFCyEAANOOQAgAUDiBEACgcD0SCJ9++um0+uqrp9lnnz397Gc/6/LrU8K1116bhg4dmi+keOWVV/ZENenmc3j00UenOeaYIw+u3Xbbbdu+3Ya+cw4rH374YZp//vnTSy+91NI60ppzGF8k8LWvfS2dfPLJLa8j3Xf+4vW99947v4/ONttsabfddktjx47tsfrSvnfffTcttthiTb8f3nvvvWm55ZZLc801VzrllFNSsYHws88+S1tttVVaddVV01//+tc0YsSIdNFFF3X5S7PTTjulI444Iv3xj39MRx55ZHruuedaXVW68Rxefvnl+faHP/whPfPMM+lvf/tbOvHEE3uszkz9OawXf7zefPPNltaR1p3Ds846K40ePTrtv//+La8n3Xf+Lr300vy3b/jw4en+++/P76UnnHBCj9WZ9sPglltu2XQYfOedd9LWW2+ddthhh/TnP/85/1285557Uq9Ua7HrrruuNvvss9f++c9/5sePP/547etf/3qn6xxwwAG1TTbZpO3xqaeeWjvssMNaXVW68RyecMIJtQcffLDt8ZFHHlnbbLPNWl5Xuu8cVu69997aPPPMU5tzzjlrI0eObHFN6e5z+Prrr9eGDBlSu+uuu3qglnTn+dt3331rp59+etvjYcOG1XbYYYeW15WObbjhhrXTTjstmnabej/81a9+VVt22WVrEyZMyI+vv/762k477VTrjVreQvjEE0+ktdZaKw0aNCg/XmmllfIno67W2WCDDdoer7HGGunRRx9tdVXpxnN48MEHp7XXXrvtcXzKXWqppVpeV7rvHFatGj/60Y/Sr3/96zTLLLP0QE3p7nN44IEH5uE38d2qDz74YA/UlO46fyussEK67LLL0ltvvZVefvnldNVVV6WNN964h2pMe84999zJammP877++uunfv369fo8078nvug5+torcVCmm2669MEHHzS9Tnwf4KhRo1pdVbrxHNZ7/vnn03XXXZf22muvFtaSVpzD448/Pi299NLpe9/7Xg/Uku4+h9FFdc011+TvnX/hhRfS97///bTffvv1UI2Z2vO3xx57pI8//jjNN998adFFF83rxzlk2lms7hw2oy/lmZYHwgEDBkxyxe6BAwemTz75pOl1ulqe3ncO6wez77777vmNLT7t0nfOYYz7jLFnZ555Zg/UkFacw2jNWHPNNdPNN9+cjjnmmHT33XenM844w5jsPnL+TjvttDyZJFoHX3nllTR+/Pg8npe+Y0AfyjMtD4QxOyoGVdaL2aYzzDBD0+t0tTy97xxWjj322PT++++nX/7yly2sId19DmN2Y7ToDhs2LM/0p2/+Hr722mtp8803b+uuWnjhhdPcc8+dWwvp/ecvJiBEAFxkkUXyuYsJJeeff34P1Jbu0pfyTMsDYUyxj26LysiRI/O4pDhIza4TM6wWXHDBVleVbjyH4aabbspT7H//+9+3jZuhb5zDaI34n//5n/zHKFoo4hbPxbinK664ogdrztT8HkZXcf1lSqL7MT6geT/tG+cveljefvvttscx0/+LL75oeV3pPn0qz7R61sq4ceNqc889d+2CCy7Ij/fYY4/alltumf//wQcf1MaPHz/JOjH7auaZZ649+eSTtTFjxtRWXnnl2kknndTqqtKN53DEiBH5HF588cX5HMatml1H7z+HsXzMoKu/LbjggrX7778/n0v6xu/h7bffnmeH33nnnbWXXnqptssuu9RWXHHFthmP9O7zF7OMl1xyydqFF15YO/vss2uLL754bccdd+zxujOpxlnGo0ePrn3++eeTLPfOO+/UBg4cWLvjjjvy65tuumltv/32q/VGLQ+E4YYbbqgNGjQovzHFL8QzzzzzvxtPqTZ8+PB21zn00ENrM8wwQ23w4MG1VVddtfbJJ5/0RFXppnN44IEH5tfqb0OHDp0GNWdqfg/rxflz2Zm+dw7PO++82lJLLZX/KK211lq1Z599todrzZSevwiKEeJj2Th/22yzTQ4Y9L5AOHTo0HxpofaceeaZtemnnz5fdmixxRarvfnmm7XeqF/80xMtkdHUHVOtY9r9nHPO2dQ6MSX/9ddfT+uuu26v7XMvyZScQ3oX57Dvcw77NuevTCNHjkzPPvts+td//ddeewmvHguEAAAU/F3GAAD0XgIhAEDhBEIAgMIJhMCXxrhx4/Jg/VtuuWWKy4ivmorrwzUjrusH8GUgEAJfGnER9Jdeeil9/etfb3qdyy67LG277bZtj08++eS8flfz7eI7uhdffPH8bRIA3e3dd9/N34Mc72ld+c///M/8jUSNtz/96U9Nb08gBL4U4lsd4qv2Ish94xvfSCuuuGJaYokl0nLLLZf/v+yyy+avADvppJPSe++9ly666KK27xatvms01r3yyivTcccd1/Z1b+Hzzz+fJCAuvfTS6dRTT0177rlnW1kA3RUGt9xyy6bCYDj44IPTBx980HZ7/PHH89dUrrLKKk1vc8BU1Beg1zj99NPT9NNPn6/zVoW5lVdeOQfAjTbaaKJln3nmmXTIIYek3XbbbaLno6v55ZdfTjvttFN+XH0J/fjx49NTTz2VlllmmYmW33HHHfOX1++66665qzpCJ8DU2n777fP7y8MPP9zU8vHBNm6Vn//85+nAAw9MQ4YMaXqbWgiBPu/VV19NRxxxRDrxxBMnatnrSLQIVq2C9Y4++uj0u9/9Ln86j1tcFP+aa67JLYSNYbDy7W9/O80666zpsMMO65Z9ATj33HPT/vvvP8nzf/nLX9Kaa66Zg963vvWtNHr06EmWGTVqVLruuuvaXb8zAiHQp0VY22677fK4v0022aTdZaK7N1r5Pv300w7LueSSS/IyO++8c+5SroJmdDN35tprr03zzjtvuvnmm9M//vGPqdwbgJTHDjb68MMP02abbZZvTz75ZJ4A99Of/nSS5c4666y0ww47TPY3ougyBvq06CY+9thj05JLLpmWWmqpHP6qrpMXXngh7bHHHmnQoEHpiy++SHPNNVd64IEHOpwkEt3O0dVy7733pi222CJ3H8dYwc7GLR5zzDHp8MMPzy2LZ5xxRjrllFNatq9AuW655Zb8fnfUUUflnpCDDjooD1epF+9z0bp41113TXb5AiHQp8Ub4ze/+c38/+gGjgkeq622WqdjCNsTE1JCzDi+/vrr86frVVddtd2u5co555yTLz0TLZTRSrjNNtukQw89NAdPgO702muvpXfeeSfNPvvsbR9Ix4wZk3s+qg/B99xzT/6O7OWXX36yyxcIgT4tunnjU3F8cp5uuuk6XTbeQONahY888kjuFt50003zBJP6y9REV0uMJYzJKTGwuyPRnRwTUyKAxrY32GCD/CYcs/3OO++8bt1HgIUWWih/SI3eiBC9ITGGMN5/KldffXUeWzgl+tW6utgWQC+/9mB088Zs3/o3xqrLeL755kszzzzzRGMJo2Uvxvx997vfzZd1iDKuuuqqtvViLGLM7nv99dfb1q03duzYtN5666WhQ4fmN+DKfffdl5//7W9/m/bZZ5+W7jdQRg/IyJEj06KLLprHEMZltE477bQ84S3GPcd1U6PlMN7/Qox5jg+p8QF1cmkhBPq0mOUbt/Z01mUcb6ghxgnWu/DCC9P999+fBg8enMfixKUb6kUXTXwCj2AZy9ZbZ511cjjdd99980y/GOvTGFIBpsRss82WbrzxxrTffvulH/zgB2mFFVbIj6swGB+A431njTXWmKLyBULgSyu6kuPWmaqT5I033shdxdFSGJeaiUkq0aV86623ph//+Mdpww03zK2GMUmlf//+edB2e62HcembCI1xcesnnngi3XTTTS3bP+DLrdbQibv66qt3eG3CuBB/fFCdUgIh8KUVXbtxWZrOxPcWx8SQXXbZJS/72GOP5a+kC8OHD88zmKOLZv31108zzTRTWnjhhXM38RxzzNFhmTFbOWYnT0m3DcC0YAwhQEp5pl7MKG7mwtYAXzYCIQBA4XxTCQBA4QRCAIDCCYQAAIUTCAEACicQAgAUTiAEACicQAgAUDiBEACgcAIhAEDhBEIAgFS2/wfLymnJzRnqvgAAAABJRU5ErkJggg==",
      "text/plain": [
       "<Figure size 800x600 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 绘制年收入的箱线图\n",
    "plt.figure(figsize=(8, 6))\n",
    "sns.boxplot(x=data['年收入'])\n",
    "plt.title('年收入箱线图')\n",
    "plt.xlabel('年收入')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 273,
   "id": "19d314c4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--- '年收入' 异常值处理信息 ---\n",
      "Q1 (25th Percentile): 931133.00\n",
      "Q3 (75th Percentile): 1499974.00\n",
      "IQR (Q3 - Q1): 568841.00\n",
      "下限 (Lower Bound): 77871.50\n",
      "上限 (Upper Bound): 2353235.50\n",
      "\n",
      "原始数据行数: 7500\n",
      "删除异常值后行数: 6984\n",
      "共删除异常值 (行): 516\n",
      "\n",
      "DataFrame 已更新，'年收入' 的异常值已被移除。\n"
     ]
    }
   ],
   "source": [
    "column_name = '年收入'\n",
    "\n",
    "# 1. 计算 Q1, Q3 和 IQR\n",
    "Q1 = data[column_name].quantile(0.25)\n",
    "Q3 = data[column_name].quantile(0.75)\n",
    "IQR = Q3 - Q1\n",
    "\n",
    "# 2. 确定异常值边界 (使用 1.5 倍 IQR)\n",
    "lower_bound = Q1 - 1.5 * IQR\n",
    "upper_bound = Q3 + 1.5 * IQR\n",
    "\n",
    "print(f\"--- '{column_name}' 异常值处理信息 ---\")\n",
    "print(f\"Q1 (25th Percentile): {Q1:.2f}\")\n",
    "print(f\"Q3 (75th Percentile): {Q3:.2f}\")\n",
    "print(f\"IQR (Q3 - Q1): {IQR:.2f}\")\n",
    "print(f\"下限 (Lower Bound): {lower_bound:.2f}\")\n",
    "print(f\"上限 (Upper Bound): {upper_bound:.2f}\")\n",
    "\n",
    "# 3. 筛选数据：保留在边界内的数据\n",
    "data_before_drop = len(data)\n",
    "data_filtered = data[\n",
    "    (data[column_name] >= lower_bound) & \n",
    "    (data[column_name] <= upper_bound)\n",
    "].copy() # 使用 .copy() 避免 SettingWithCopyWarning\n",
    "\n",
    "# 4. 更新 DataFrame 并报告结果\n",
    "data = data_filtered\n",
    "data_after_drop = len(data)\n",
    "rows_dropped = data_before_drop - data_after_drop\n",
    "\n",
    "print(f\"\\n原始数据行数: {data_before_drop}\")\n",
    "print(f\"删除异常值后行数: {data_after_drop}\")\n",
    "print(f\"共删除异常值 (行): {rows_dropped}\")\n",
    "\n",
    "print(f\"\\nDataFrame 已更新，'{column_name}' 的异常值已被移除。\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 274,
   "id": "a7d2bb3d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAoQAAAIhCAYAAADXZqsSAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjcsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvTLEjVAAAAAlwSFlzAAAPYQAAD2EBqD+naQAAJM1JREFUeJzt3Ql03VWdwPHbUqAtSylbka20SFlEwKUIwrAIIiqy6QiyKaODiqMzuAwgIIdNEBXEKiJFWQREYFgEdMayCCPIIlZA2YQWLCDI1gVoaUvfnN8952XSNEnTkjRpfp/POY80yXsv/yy8fHPv/943oNFoNAoAAGkN7O0DAACgdwlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCoM+aNGlSefnll1tenzt3brnlllvKvHnzytLijTfeWOBtniAK6GsEIdBjnnvuuXLOOefUl4vjwx/+cLniiitaXr/zzjvL+973vnL22Wd3+T7uvffe8uCDDy7Wx7/rrrvKpZdeWl555ZUF3jdhwoRy2223dRp3f/3rX8uoUaNqxLb2yU9+smyyySb1tg8//HCZOXPmAreN+J06deoiX5amWAb6jkG9fQBA/zVu3LhyyimnlBVWWKEcfPDBi3z7IUOGlOWWW67l9RtuuKEMGzasfOpTn+rS7V9//fVyyCGHlKeeeqpceeWV5f3vf3+XP3YE2eGHH16eeeaZ8p73vKe+HqN9gwYNKgMGDCgHHHBA2X777csOO+zQ4X386Ec/KiuuuGK9fWurrbZaGTp0aA3Cgw46qGy88cblkksume86d9xxR9lxxx3Lovrzn/9c3va2ty3y7YDcjBACPeKRRx4p3/3ud8tOO+1UPvKRj7Q7mvXSSy+Vv//97+Vvf/tbvc19991XA2natGn19YED//8hKmLsZz/7WfmXf/mXGlmt3x4jbBF/bS2//PI1rN797neXPffcc4GRus4ceeSRZeLEiWXGjBllo402Kuuuu255y1veUk4++eRy4oknlhdeeKH85je/Kausskq9rLTSSuXAAw9suX2Mip577rn1EuHX+vgiaiN24/OLEIz3tx2FjPAM8fHj/XGJr9cJJ5xQXnzxxZa3NS9Tpkyp128d0ABdJQiBbheBtt9++5U5c+aUBx54oLz1rW+tl4iqGB1rvr7hhhvWyzve8Y56uwidCKTBgwcvcJ+/+tWvytNPP13OPPPMOkLXvEQ4xWjbqaee2u6xRHzFyOJWW21VjymiqjMRmF/72tfqx/npT39a7rnnnpaRt9mzZ9djPeOMM8qFF15Yj2edddYpxx57bHn88cfL97///Zb7ifvYd9996yjid77znToK2Ppjx8eJkcuYVt5ggw3qVPh1113X8v5ll112gWP71re+VT/PuE3rsG49bb3MMsss5LsDsCBTxkC3ijg57LDD6mhfjI7967/+a8v7fvCDH9Q4euKJJ9q9bXN0K0b22ouhvffeu07DNsXUaIxCxtTqyiuv3OExRWD+5Cc/qdf/5je/WY+hIzHSFiODv/jFL+rI4kMPPVT22WefOnUc4Rbn/J100kl1KjocccQR9XNec801W+4jjjEC9v777y/PP/98nTqP611//fX19tdee209rzGOJwJzyy23rPe32WabdRiE8fWMSI0o3WabbVre/va3v7386U9/anm99agqQFcJQqDbRBjFeXcXX3xxDaDWMdj2erH4IUbJZs2aVUf4mlOk7Ymp3lhQEotMYuXxe9/73jr6GCuQt9hii7pwY2Fuv/32OqIYi1y+/vWvl1VXXbXd60X03XjjjfXfEZHHHXdcjcHmApXmKOC3v/3t+W532WWXlZtuuqn+OxbCxGhnjIhGoMUxHnXUUeWDH/xgvf+xY8fWqeAnn3yyHlNT3KY9MT39z//8z/Vzjo8Ro4l333132WWXXeoxikDgzfIoAnSbiJs4xy/Oc4sFHM0p3ebl3//932sExbRmvB4jgTGlG7HXkYjG//iP/6iLUqZPn96y6jimTSMsuxKDEZ8xuhb3E8d03nnndenziVG61iN18e84v7HtuZBf/epX66KTpu9973sto4MjRoyoq6Ljc45zDmPU9KMf/Wi93aOPPlouv/zyGosRiTHK2IzP1iJg430RnTFaGSOce+21Vx2JjNsBvFlGCIFuFYESl+YCjtah1HbKuDlC2N45g00RUkcffXTZeeedy0UXXVRHH0NM644ZM6YMHz58occUIRUBGVvFxMhcjBJ+5Stf6fB8u5jGjdiM+Hv22WfrlG6EZPNjt6f1SF+MCIYI4FhQ05ziPeuss8of//jHup1NHEesPo7rxvmNn//85+vLCMi2Yso5FqXE1yqOLc5PfOc73ykGgW5jhBDoEV1Z3BDXiS1pFnbd/fffv4ZSjI7FyFuMrP33f/93jcSuLHCJEbaPfexjdQHIv/3bv5XJkyeXn//85x3eJlYRx7mGEYURnDE9HK83tV7UEpeYVm7rggsuqKN/p512Wo25WFASi1BixXKMIK6++ur1/bGX4bbbbltXZcf5gG2/FnH8caxf+MIX6mhofB1iRDXicfPNN6+3/fKXv1xHIwEWlyAEuk2MBr766qt14UNXNbdkee211+roWYgFFjESFgspWouVyTGqFnF21VVX1VXDC3P88cfXc/BiUUqI0b7ddtutTmvHx2xPjOpFPDant2OErvV2Lm23fIlFJq3FOYiHHnponeJef/316wjoMcccU04//fS6Inn33XevC1ZiS5wYLYxzLWPE8x//+McCxxK3jfiNEIzzBiOgYzQxRgfj67XHHnvUcw/XWGONLn7FAdrRAOgmv/71r2P/k8W6bLzxxo1jjz22sckmmzSOO+64xu23397YaqutGueff/58H2P8+PH1+ptvvnlj3rx5nR7PhAkTGgMHDmycfvrp8739zjvvrPdxyCGHtHu7uN+dd965XkaOHNm4+uqrG0OHDm2cd955HR7/jjvu2HL76dOnN04++eTG9ddf33jwwQcb06ZNW+BjjBs3rrHWWms1VlxxxcYBBxzQmDt37nzvnzhxYr3fGTNmtLxtzpw5jeeee65x0kknNdZcc83GKaec0vK+KVOm1OtPnjy5068JQHuMEALdZtddd62jYjHNGaOFbUfSYqRt5MiR870tFnzE6tkY7YrRvNiOJaZsYyVxe1PJsXdfiFXCcbuOxL6BsTI3RgNjSrW1GGWMEbw4J7G5bUxrMZoY5xw2N5qO7W5iMUuct9eVEcLYpDpGBGNVcSyeif0FY6o6poWb4nzAOD8xziGMDbfjc13YcxzH1zc22I77vPnmm+siGYDuIAiBbhOhEjEU05xd3SC59ebSzaeFa08EZoRdxFc8W0gsTPnABz7Q8gwdbbeYiWdIWXvttet+gu0dS2wfE9O2cT5fRFbzfuL8vDjnMBaAtJ4m/tCHPtTp1jitYy7OOdx0003r5xTnBf74xz+u/47zEeNcwlgxHecdxsvY+PoPf/hDvV1E8C9/+cuWBTdtjR8/vm52HZt3x/mDzWnw1jyXMbA4rDIGlpiInIWNgrW9flzinLxYFRzn9MW5g7HlSsRUvIxFGrHoJBZdxEKLGGWMIBs9enT59a9/3eGG1fH0d7ENTOzlFxtGx9PnRZjFiuBYEBLhFS9bh1lzxXR70Rox1xRb7jz22GN1RXXcfwRynB8YW9/EVjHNhSFxPmO8L0YxI27jbc3jbX6s2HcxYrL5ceO44hLXjeCNjxNitLH17QAWhSAElpiY4m1v5KsjMa0aK4IjomJVbjzDR2zsHGKxRiyyiJG+Sy+9tL499gmMIIqRv4i8eJq8zsSI3f/+7/+WT3/603UvwOaeg81nIYnFLq2fg7j577ZRGyOW11xzTcvrEaYxKthajJzGIpKYOo7p5Hgu4xCfW0RgjFR+9rOfrSObofl1imhcFIIQWBwD4kTCxbolwCKK5+GNp5qLVb+LIlYDN0fJuhKdcenq9TsTewXGBtLrrbde6Wlt92OM7WjiKfniGGI0c2Hi3Mo4zliZ3dwHEaCrBCFAHxQLc2Lfwg033LDD8yoBuosgBABIzipjAIDkBCEAQHKCEAAgucXediY2P33mmWfqVgpOeAYA6HtiqUjsVhD7lg4cOLD7gzBicElsxQAAwJsTz8a07rrrdn8Qxshg8wN09EwAAAD0nnh++RjAa3Zbtwdhc5o4YlAQAgD0XQs7vc+iEgCA5AQhAEByghAAIDlBCACQnCAEAEhOEAIAJCcIAQCSE4QAAMkJQgCA5AQhAEByghAAIDlBCACQnCAEAEhOEAIAJCcIAQCSE4QAAMkJQgCA5AQhAEByghAAIDlBCACQnCAEAEhOEAIAJCcIAQCSE4QAAMkJQgCA5AQhAEByghAAIDlBCACQnCAEAEhOEAIAJCcIAQCSE4QAAMkJQgCA5AQhAEByghAAIDlBCACQnCAEAEhOEAIAJCcIAQCSE4QAAMkJQgCA5AQhAEByg3r7AICOPffcc2XatGm9fRgsIcOGDSsjRozo7cMAEhKE0Idj8KCDDylzZr/e24fCErLscsuXi392kSgEljhBCH1UjAxGDM4cvWOZN3jYm7qvgTOnliGTbyszR+1Q5g1ZpduOke4zcNa0UibdWr/vghBY0gQh9HERg/NWWL177mvIKt12XwD0HxaVAAAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASK5fBOGsWbPKo48+Wl8CAH2L39N9X78Iwr/97W/lsMMOqy8BgL7F7+m+r18EIQAAi08QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAguUFlKfDGG2+U+++/v7z00ktl5ZVXLpMmTSrPPvtsWXvttctee+3V24cHAHRi7ty59eWRRx5Zpk6dWhqNRhk8eHBZY401yuuvv15efPHF+ru+rVGjRpUXXnihvPrqq2XAgAH1+vFy0KBBZfr06WXOnDn1vuL1uP3AgQPL8ssvX6/3lre8pbz22mtlypQp9eVKK61Uj+Pll18us2bNqvffvK8hQ4aUGTNm1PtqivsK8+bN6/Rzi/tdbbXVyrRp0+r9zpw5s93rxTHF8cYxrLXWWuXTn/502XrrrcsyyyxT+oI+H4S33XZbOfvss2sAtuecc84pu+yyyxI/LgBg4eL39GWXXVb/HTHWFPEUsdaZyZMnz/d6Ry3Q2iuvvFID8+GHH57v7RGibUUARqTNmTNngfctLASbIiTjsjDPP/98y78fe+yxcvTRR9cY/cY3vlF22GGH0tsG9vUYPP7448vo0aPLrrvu2lLiG264Yf33nnvuWUcMf/Ob3/TykQIAncUgC4rRwgjC6J3e1meDMIZ+Y2Rw2223rV+sW265pQwfPrxcffXVZfz48eW9731vueeee+oPWkRi6yFpAKB3zZ49u1x++eW9fRh91oABA1r+PW7cuHanzPvklHHM8celKebue1KcMxhDw8cdd1y57rrr6hcq5ttjeDUceOCB5Qtf+EJ58MEH63mEF198cbnqqqta3g9LuyeffLK3D4Fe4PtOf3HjjTd2edo1o0ar8xVjOjm65x3veEevHU+X6+nUU08tJ5xwQllSYgFJ84TS+KEKMVrYFG9vXm+LLbao/47rNa8LsDQ65ZRTevsQgF7Q7J4+H4Rx8uOXv/zl+UYI11tvvZ46rrLqqqu2nFAaq4nD73//+7LHHnu0vL15vXvvvbf+O84z/PjHP95jxwRLeqRIHORzzDHHlJEjR/b2YcCbFgM0powXvXv6fBDGMu64LCkx6hfLsi+55JJ6DmGcmPqTn/yk7L777nUpeLw9lpRvttlm9f1h3333LWPGjFlixwjQ3SIGPY7RH2ywwQblyiuvNG3cyTmEzWnj2JKmOdvZW/rsopLYl+fwww+vo4Innnhi2Wmnnepy9X322ad85jOfKXfccUcZO3Zs2X///VuWezt/EAD6huWWW86sXRfPIfziF7/Y6/sR9tkgDLEvT5y3GBtR33TTTfVtEX/xevjlL39Zp6532223Xj5SAKCtz33uc3XghvbFQFYMevWFfQj7/JBafJG22267Tp+p5IknnrAXIQD00SiM3+Ux6xfbx3mmkrmeqWRxxRer9VLsmCoGAJYOzVO6vvWtbzlHto/q01PGAAD0PEEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyQlCAIDkBCEAQHKCEAAgOUEIAJCcIAQASE4QAgAkJwgBAJIThAAAyfWLIFx//fXLueeeW18CAH2L39N936DSDwwePLiMGTOmtw8DAGiH39N9X78YIQQAYPEJQgCA5AQhAEByghAAIDlBCACQnCAEAEhOEAIAJCcIAQCSE4QAAMkJQgCA5AQhAEByghAAIDlBCACQnCAEAEhOEAIAJCcIAQCSE4QAAMkJQgCA5AQhAEByghAAIDlBCACQnCAEAEhOEAIAJCcIAQCSE4QAAMkJQgCA5AQhAEByghAAIDlBCACQnCAEAEhOEAIAJCcIAQCSE4QAAMkJQgCA5AQhAEByghAAIDlBCACQnCAEAEhOEAIAJCcIAQCSE4QAAMkJQgCA5AQhAEByghAAIDlBCACQnCAEAEhOEAIAJCcIAQCSE4QAAMkJQgCA5AQhAEByghAAIDlBCACQnCAEAEhOEAIAJCcIAQCSE4QAAMkJQgCA5AQhAEByghAAIDlBCACQnCAEAEhOEAIAJCcIAQCSE4QAAMkJQgCA5AQhAEByghAAIDlBCACQnCAEAEhOEAIAJCcIAQCSE4QAAMkJQgCA5AQhAEByghAAIDlBCACQnCAEAEhOEAIAJCcIAQCSE4QAAMkJQgCA5AQhAEByghAAIDlBCACQnCAEAEhOEAIAJCcIAQCSG9TbBwB0buCsaW/+PmZOne8l/fP7DLC4BCH0UcOGDSvLLrd8KZNu7bb7HDL5tm67L7pffL/j+w6wpAlC6KNGjBhRLv7ZRWXaNCNHWUQMxvcdYEkThNCHRRwIBAB6mkUlAADJCUIAgOQEIQBAcoIQACA5QQgAkJwgBABIThACACQnCAEAkhOEAADJCUIAgOQEIQBAcoIQACA5QQgAkJwgBABIThACACQnCAEAkhOEAADJCUIAgOQEIQBAcoIQACA5QQgAkJwgBABIThACACQnCAEAkhOEAADJCUIAgOQEIQBAcoIQACA5QQgAkJwgBABIThACACQnCAEAkhOEAADJCUIAgOQEIQBAcoIQACA5QQgAkJwgBABIThACACQnCAEAkhOEAADJCUIAgOQEIQBAcoMW94aNRqO+nD59enceDwAA3aTZac1u6/YgnDFjRn253nrrLe5dAACwBES3DRs2rMP3D2gsLBk7MG/evPLMM8+UlVZaqQwYMODNHCO9/JdDRP2UKVPKyiuv3NuHQy/wM4CfAfwM9F+ReRGDa6+9dhk4cGD3jxDGna677rqLe3P6mHgA8CCQm58B/AzgZ6B/6mxksMmiEgCA5AQhAEBygjC55Zdfvhx//PH1JTn5GcDPAH4GWOxFJQAA9A9GCAEAkhOEAADJCUKABF544YUyatSo8sQTT3Tp+nvuuWfdY7Z52XXXXXv8GIHeIwgT+POf/1zGjh1bhg8fXr72ta8t9OlrwhZbbDHfL4PPfOYzS+RY6RsxcOutt5ZNN920rL766uWMM87o8eOj57//e+yxR5e//+EPf/hDeeCBB8rLL79cL9dee22PHiM9K75/o0ePLoMGDSpbbbVVeeihhxZ6G48DuQjCfu71118vH/nIR8q73vWu+gD/4IMPlgsuuKDT27z22mvl8ccfL//4xz9afhmMGzduiR0zvRsDzz//fB0d+sQnPlF+//vfl0suuaTccsstPX6c9Jz999+/HHDAAV2+/tNPP13/cNx8883LKqusUi8rrLBCjx4jPScezw899NBy2mmn1e/tmDFjFvpHvseBhGKVMf3X1Vdf3Rg+fHjj1Vdfra//6U9/amy33Xad3uZ3v/tdY5tttllCR0hP22WXXRpnnXVWDAs3Jk+evNDrn3nmmY1NNtmkMW/evPr6Nddc0zjwwAOXwJHSUyZNmlRfdvVn4KqrrmqsscYajXXWWacxdOjQxn777dd46aWXlsCR0hOuu+66xo9//OOW12+++ebGkCFDOr2Nx4F8jBD2c/fdd1/ZZpttytChQ1umgmOUsDN33313eeqpp8oaa6xRRwY+//nP15FGlk7jx48vX/rSlxbpZ2bnnXdueY7yrbfeutx77709eIT0tDhdYFE8/PDDZcsttyw33HBDufPOO8vkyZPL0Ucf3WPHR8+KGYLDDjus5fVHHnmkbLTRRp3exuNAPoIwwROWt/5lEP9zL7PMMnUauCPxYLH99tuX3/3ud+V//ud/yoQJE8qZZ565hI6Y3o6Btj8z8bymzzzzTA8cGX1VxF/8fx9R+Pa3v718+9vfLldeeWVvHxbdYPbs2eW73/1u+dznPtfp9TwO5DOotw+AnhUnELfdeX7w4MH1PMFYZNKec845Z77Xv/GNb5Tvf//75aijjurRY6Vv/sw0f17Ia8011ywvvvhinSnwTBZLt3g2kjgfdGHnEHocyMcIYT+36qqr1pODW5sxY0ZZbrnlFumXQZyITM6fmUX9eWHpt99++9UZgqZYVDBixAgxuJS7+eabyw9/+MNy6aWXlmWXXbbT63ocyEcQ9nOx3Uw8mDfFuUDxV378z96RbbfdtkyZMqXl9bj9yJEje/xY6Zs/MxMnTizrrLNOrx4TPSOmBefMmbPA22Oa+IgjjqhReM0119Qp5DiXmKVXPPbHiuEIws0222yh1/c4kI8g7Od22GGH+qB//vnn19e/+c1v1g1m4zzCqVOnljfeeGOB27ztbW8rn/3sZ8tdd91VLrzwwnq+iV8GeWIgtpq4/fbby4033ljff/rpp5cPfOADvXKM9KxYZBYLR9o68sgj6/t23333+v/+4YcfXo455pheOUbevJkzZ9aFJXvttVfZZ599yiuvvFIvsfDc4wAtenuZMz3v2muvrVtHrLbaanUrib/85S/17fHtnzhx4gLXf/nllxt777133ZZg5MiRjbPPPrsXjpru1nbLkfjexrZE7fnRj37UWHbZZeuWRaNGjWo8++yzS/BIge4UW8bE//9tL/F44HGApgHxn//PQ/qrZ599tm4ZEFvQrLbaar19OCwlU0yx/cg//dM/lRVXXLG3DwfoBR4H8hCEAADJOYcQACA5QQgAkJwgBABIThAC/UZsjxELp9rbSqWrYhuOrj53d2zdAdAfCEKg3/iv//qv8sQTT5Ttttuuy7e5+OKL695sTbHvZtx+YevtHn300TJ69OhyySWXvKljBmjPCy+8UJ9POh7TFvWZhr74xS+WReW5jIF+Yd68eeXkk0+uIbf99tu3bMgbT7cVG7HPnTu3Phfrl770pXLooYeW6667rnzqU5+qz9HafEq2uO3Pf/7zMm7cuDJgwICW+549e3Z9qq/WbxszZkz53ve+V58TNkYm474AuisGYzPxRY3BX/3qV+W3v/1teeSRRxb5YwpCoF+Ip+SKaIs9N5vhttVWW5XvfOc79dl5WvvLX/5Sn46tbcTFVPOTTz5ZDjzwwPp6BGSImHzggQfKxhtvPN/1DzjggDJo0KByyCGH1KnqTTbZpIc/SyCD/fffvz6+xDOGddWrr75an1Xo1FNPLausssoif0xTxsBSL557+7jjjiunnXbafKN4HYkRweaoYGsnnHBC+cUvflH/Oo/LjjvuWK644oo6Qtg2Bps++tGPlpVWWslTuwHdZvz48XU2o6177rmnvOc97ynDhg0r++67b5k2bdp8j1/xWBV/pE6YMKHOmiwKQQgs1eIB8GMf+1g976+j51qNqeAY5Zs1a1aH93PRRRfV6xx00EHlxRdfbAnN9ddfv9OPf+WVV5YRI0aU66+/vjz22GNv8rMBKPXcwbamTp1aPvjBD9bL/fffXxfAfeUrX6nvi5mNs846q95u0qRJ9fnI995770WKQlPGwFItpolPOumk8ta3vrVstNFGNf7ivMDw+OOP13P8hg4dWt54442y+uqrl9tvv73DRSIx7fyf//mf5dZbby0f/vCH64NsnCvYkXiwPfHEE8uxxx5bRxbPPvvscsYZZ/TY5wrkdcMNN9THu+OPP77OhHz1q1+tp6uECy+8sP5hetNNN9XHvwjFkSNHlhtvvLHstttuXbp/QQgs1eKBsfmAF9PAF1xwQXn3u9/d6TmE7YkFKSFWHF9zzTX1eVvf9a53tTu13HTuuefWrWdihDIejPfaa6/y9a9/vYYnQHd66qmnyvPPP1+GDx/e8gfpjBkz6sxHvC8e55p/DMdpLPEHcsxaCEIghZjmjdG/+Ms5VhN3Jh5AY0Xw3XffXaeFd99997rApPU2NZ/4xCfquTixOCVO7O5ITCfHwpQI0PjY73vf+8pmm21WjjrqqHLeeed16+cIsO6669Y/UmM2IsRsSJxDGI8/8b6HHnpovse6iMR11lmny/c/oLGwzbYA+vjegzHNGydSxwNjazFlvNZaa5UVVlhhvnMJY2Qvzvn7+Mc/Xrd1iPu47LLLWm4X5yLG6r6nn3665batxXY2O+20U52Sufzyy1veftttt9W3/+AHP6ir/QDe7AzI5MmTywYbbFDPIdx0003ruYKx4C3Oe459UyP8/vrXv9aZkXhbLDqJrbPOP//8etv2HsPaFUEI0B9tueWWjQkTJnR6nSuuuKKx3377tbz+05/+tDFkyJDGiBEjGmeeeeYC158+fXpj1113bbzzne9svPLKKwu8/8gjj4w/shvHHHNMY/bs2d30mQAZlVIakydPbnn97rvvbmy99daNoUOHNsaOHdu46667Wt537bXXNrbYYovG4MGDG5tvvnnjjjvuWKSPZcoY6LdiKjkunWlOkvz973+vU8UxUhhbzcQilZhSjo1ejzjiiLLLLrvUUcNYpDJw4MB68nZ7f3nH1jdxXs8pp5xS7rvvvroBNsDiaDuJO3bs2A73Jtxzzz3rZXEJQqDfiqnd2JamM/G8xbEw5OCDD67X/eMf/1ifki5MnDixrmCOKZqdd965DBkypKy33np1mnjVVVft8D5jtXKsTo7zCgGWBs4hBCilrtSLFcVd2dgaoL8RhAAAyXmmEgCA5AQhAEByghAAIDlBCACQnCAEAEhOEAIAJCcIAQCSE4QAAMkJQgCA5AQhAEDJ7f8AK0GCLJFP5tsAAAAASUVORK5CYII=",
      "text/plain": [
       "<Figure size 800x600 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 绘制年收入的箱线图\n",
    "plt.figure(figsize=(8, 6))\n",
    "sns.boxplot(x=data['年收入'])\n",
    "plt.title('年收入箱线图')\n",
    "plt.xlabel('年收入')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67b701a6",
   "metadata": {},
   "source": [
    "为什么删除了异常值后箱线图上还是有点？\n",
    "这是因为箱线图是一个“递归”的概念，每次您移除异常值并重新绘制时，箱体和边界都会重新计算。\n",
    "1. 每次对新数据 q1和q3都会重新计算，所以上下限会重新计算\n",
    "2. 但是即使如此，新的箱线图会分布更加集中\n",
    "\n",
    "解决这个问题直接设置好固定上限即可"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 275,
   "id": "cd5d5fdf",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Index: 6984 entries, 0 to 7499\n",
      "Data columns (total 17 columns):\n",
      " #   Column    Non-Null Count  Dtype  \n",
      "---  ------    --------------  -----  \n",
      " 0   房屋所有权     6984 non-null   int64  \n",
      " 1   年收入       6984 non-null   float64\n",
      " 2   当前工作年限    6984 non-null   float64\n",
      " 3   税收留置权     6984 non-null   float64\n",
      " 4   开放账户数量    6984 non-null   float64\n",
      " 5   信用历史年限    6984 non-null   float64\n",
      " 6   最大开放信用额度  6984 non-null   float64\n",
      " 7   信用问题数量    6984 non-null   float64\n",
      " 8   距上次拖欠月数   6984 non-null   float64\n",
      " 9   破产次数      6984 non-null   float64\n",
      " 10  贷款目的      6984 non-null   int64  \n",
      " 11  贷款期限      6984 non-null   int64  \n",
      " 12  当前贷款金额    6984 non-null   float64\n",
      " 13  当前信用余额    6984 non-null   float64\n",
      " 14  月债务       6984 non-null   float64\n",
      " 15  信用评分      6984 non-null   float64\n",
      " 16  信用违约      6984 non-null   int64  \n",
      "dtypes: float64(13), int64(4)\n",
      "memory usage: 982.1 KB\n"
     ]
    }
   ],
   "source": [
    "data.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8c4a87ab",
   "metadata": {},
   "source": [
    "可以看到，删除异常值是删除了所有的样本，也可以只删除这个特征值而不删除整个样本，再用缺失值填补即可。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a1b4a23",
   "metadata": {},
   "source": [
    "## 1.8 可视化分析\n",
    "\n",
    "这部分不在赘述，3类图，自行选择"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1d4f4371",
   "metadata": {},
   "source": [
    "## 1.9 其他核心部分"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e0f8564a",
   "metadata": {},
   "source": [
    "\n",
    "**特征工程**\n",
    "- **衍生新特征**：根据已有特征创建新的特征，可能会对模型性能有提升。例如，可以计算“Debt - to - Income Ratio”（负债收入比），即“Monthly Debt”与“Annual Income”的比值，来反映客户的债务负担情况。\n",
    "- **特征选择**：通过相关性分析等方法，选择与目标变量“Credit Default”相关性较高的特征，去除相关性较低或冗余的特征，以降低模型的复杂度和过拟合的风险。\n",
    "\n",
    "\n",
    "此外，数据不平衡输出我们后面再说\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f0fb0c52",
   "metadata": {},
   "source": [
    "此时你可能会好奇，怎么还没有归一化/标准化？这就需要我们引入数据泄露的观点了\n",
    "\n",
    "我们之所以推迟归一化或标准化步骤，正是为了避免关键的训练集/测试集数据泄露问题。一旦在划分数据集之前对全集应用此类预处理，训练过程就间接利用了测试集的均值和标准差等统计信息，这会导致对模型在未知数据上性能的乐观估计。课上反复提及的核心：\n",
    "\n",
    "考试理论：\n",
    "1. 不要提前知道考试题。\n",
    "2. 调参就需要考2次"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9f23e13a",
   "metadata": {},
   "source": [
    "# 二、机器学习模型建模"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c31e7dd8",
   "metadata": {},
   "source": [
    "## 2.1 数据划分"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 276,
   "id": "f1a44b64",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['房屋所有权', '年收入', '当前工作年限', '税收留置权', '开放账户数量', '信用历史年限', '最大开放信用额度',\n",
       "       '信用问题数量', '距上次拖欠月数', '破产次数', '贷款目的', '贷款期限', '当前贷款金额', '当前信用余额', '月债务',\n",
       "       '信用评分', '信用违约'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 276,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 277,
   "id": "3dd4664a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "训练集形状: (5587, 16), 测试集形状: (1397, 16)\n"
     ]
    }
   ],
   "source": [
    "# 划分训练集和测试集 \n",
    "from sklearn.model_selection import train_test_split\n",
    "X = data.drop(['信用违约'], axis=1)  # 特征，axis=1表示按列删除\n",
    "y = data['信用违约']  # 标签\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)  # 划分数据集，20%作为测试集，随机种子为42\n",
    "# 训练集和测试集的形状\n",
    "print(f\"训练集形状: {X_train.shape}, 测试集形状: {X_test.shape}\")  # 打印训练集和测试集的形状"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "735d2f80",
   "metadata": {},
   "source": [
    "## 2.2 数据归一化\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7333370c",
   "metadata": {},
   "source": [
    "只需要对连续特征归一化即可，离散特征编码后虽然是数值，但是不用动"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 278,
   "id": "c1c3f5ee",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>房屋所有权</th>\n",
       "      <th>年收入</th>\n",
       "      <th>当前工作年限</th>\n",
       "      <th>税收留置权</th>\n",
       "      <th>开放账户数量</th>\n",
       "      <th>信用历史年限</th>\n",
       "      <th>最大开放信用额度</th>\n",
       "      <th>信用问题数量</th>\n",
       "      <th>距上次拖欠月数</th>\n",
       "      <th>破产次数</th>\n",
       "      <th>贷款目的</th>\n",
       "      <th>贷款期限</th>\n",
       "      <th>当前贷款金额</th>\n",
       "      <th>当前信用余额</th>\n",
       "      <th>月债务</th>\n",
       "      <th>信用评分</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3161</th>\n",
       "      <td>1</td>\n",
       "      <td>0.399142</td>\n",
       "      <td>0.7</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.300</td>\n",
       "      <td>0.409683</td>\n",
       "      <td>0.000378</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.315217</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0.005110</td>\n",
       "      <td>0.092358</td>\n",
       "      <td>0.289425</td>\n",
       "      <td>0.007220</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4119</th>\n",
       "      <td>1</td>\n",
       "      <td>0.363551</td>\n",
       "      <td>0.2</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.200</td>\n",
       "      <td>0.189944</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.152174</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0.002002</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.074997</td>\n",
       "      <td>0.022238</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4317</th>\n",
       "      <td>0</td>\n",
       "      <td>0.283759</td>\n",
       "      <td>0.2</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.225</td>\n",
       "      <td>0.256983</td>\n",
       "      <td>0.000294</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.152174</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.051029</td>\n",
       "      <td>0.118563</td>\n",
       "      <td>0.022527</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4091</th>\n",
       "      <td>1</td>\n",
       "      <td>0.233163</td>\n",
       "      <td>0.8</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.275</td>\n",
       "      <td>0.309125</td>\n",
       "      <td>0.000257</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.293478</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.002017</td>\n",
       "      <td>0.041781</td>\n",
       "      <td>0.194017</td>\n",
       "      <td>0.021083</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5576</th>\n",
       "      <td>1</td>\n",
       "      <td>0.450868</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.250</td>\n",
       "      <td>0.256983</td>\n",
       "      <td>0.000384</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.293478</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.004524</td>\n",
       "      <td>0.037528</td>\n",
       "      <td>0.287096</td>\n",
       "      <td>0.021372</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2446</th>\n",
       "      <td>1</td>\n",
       "      <td>0.368087</td>\n",
       "      <td>0.5</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.075</td>\n",
       "      <td>0.299814</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.152174</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0.004390</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.119249</td>\n",
       "      <td>0.022383</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4782</th>\n",
       "      <td>1</td>\n",
       "      <td>0.774397</td>\n",
       "      <td>0.6</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.350</td>\n",
       "      <td>0.182495</td>\n",
       "      <td>0.000412</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.152174</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.078432</td>\n",
       "      <td>0.352548</td>\n",
       "      <td>0.014440</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>413</th>\n",
       "      <td>0</td>\n",
       "      <td>0.526158</td>\n",
       "      <td>0.6</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.250</td>\n",
       "      <td>0.242086</td>\n",
       "      <td>0.000100</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.152174</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.001040</td>\n",
       "      <td>0.014725</td>\n",
       "      <td>0.159529</td>\n",
       "      <td>0.017906</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2032</th>\n",
       "      <td>2</td>\n",
       "      <td>0.981075</td>\n",
       "      <td>0.5</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.125</td>\n",
       "      <td>0.072626</td>\n",
       "      <td>0.000544</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.152174</td>\n",
       "      <td>0.0</td>\n",
       "      <td>7</td>\n",
       "      <td>1</td>\n",
       "      <td>0.005882</td>\n",
       "      <td>0.042455</td>\n",
       "      <td>0.228163</td>\n",
       "      <td>0.008953</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1097</th>\n",
       "      <td>0</td>\n",
       "      <td>0.367261</td>\n",
       "      <td>0.7</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.350</td>\n",
       "      <td>0.158287</td>\n",
       "      <td>0.000600</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.836957</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.001040</td>\n",
       "      <td>0.054314</td>\n",
       "      <td>0.077721</td>\n",
       "      <td>0.022238</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1397 rows × 16 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      房屋所有权       年收入  当前工作年限  税收留置权  开放账户数量    信用历史年限  最大开放信用额度  信用问题数量  \\\n",
       "3161      1  0.399142     0.7    0.0   0.300  0.409683  0.000378     0.0   \n",
       "4119      1  0.363551     0.2    0.0   0.200  0.189944  0.000000     0.0   \n",
       "4317      0  0.283759     0.2    0.0   0.225  0.256983  0.000294     0.0   \n",
       "4091      1  0.233163     0.8    0.0   0.275  0.309125  0.000257     0.0   \n",
       "5576      1  0.450868     1.0    0.0   0.250  0.256983  0.000384     1.0   \n",
       "...     ...       ...     ...    ...     ...       ...       ...     ...   \n",
       "2446      1  0.368087     0.5    0.0   0.075  0.299814  0.000000     1.0   \n",
       "4782      1  0.774397     0.6    1.0   0.350  0.182495  0.000412     2.0   \n",
       "413       0  0.526158     0.6    0.0   0.250  0.242086  0.000100     0.0   \n",
       "2032      2  0.981075     0.5    0.0   0.125  0.072626  0.000544     0.0   \n",
       "1097      0  0.367261     0.7    0.0   0.350  0.158287  0.000600     0.0   \n",
       "\n",
       "       距上次拖欠月数  破产次数  贷款目的  贷款期限    当前贷款金额    当前信用余额       月债务      信用评分  \n",
       "3161  0.315217   0.0     0     1  0.005110  0.092358  0.289425  0.007220  \n",
       "4119  0.152174   0.0     0     1  0.002002  0.000000  0.074997  0.022238  \n",
       "4317  0.152174   0.0     0     0  1.000000  0.051029  0.118563  0.022527  \n",
       "4091  0.293478   0.0     0     0  0.002017  0.041781  0.194017  0.021083  \n",
       "5576  0.293478   1.0     0     0  0.004524  0.037528  0.287096  0.021372  \n",
       "...        ...   ...   ...   ...       ...       ...       ...       ...  \n",
       "2446  0.152174   0.0     0     1  0.004390  0.000000  0.119249  0.022383  \n",
       "4782  0.152174   0.0     0     1  1.000000  0.078432  0.352548  0.014440  \n",
       "413   0.152174   0.0     0     0  0.001040  0.014725  0.159529  0.017906  \n",
       "2032  0.152174   0.0     7     1  0.005882  0.042455  0.228163  0.008953  \n",
       "1097  0.836957   0.0     0     0  0.001040  0.054314  0.077721  0.022238  \n",
       "\n",
       "[1397 rows x 16 columns]"
      ]
     },
     "execution_count": 278,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\n",
    "from sklearn.model_selection import train_test_split \n",
    "from sklearn.preprocessing import StandardScaler, MinMaxScaler\n",
    "\n",
    "# ----------------------------------------------------------------------\n",
    "# 定义连续特征列 (需要归一化的特征)\n",
    "# ----------------------------------------------------------------------\n",
    "continuous_features = [\n",
    "    '年收入', \n",
    "    '当前工作年限', \n",
    "    '开放账户数量', # 虽为计数，但一般也进行缩放\n",
    "    '信用历史年限', \n",
    "    '最大开放信用额度', \n",
    "    '距上次拖欠月数',\n",
    "    '当前贷款金额', \n",
    "    '当前信用余额', \n",
    "    '月债务', \n",
    "    '信用评分'\n",
    "]\n",
    "\n",
    "# 初始化归一化器 (MinMaxScaler)\n",
    "scaler = MinMaxScaler() \n",
    "\n",
    "\n",
    "# 仅在训练集上 fit (学习最大值和最小值)\n",
    "# 然后对训练集进行 transform (应用缩放)\n",
    "# 注意：Scikit-learn 返回 NumPy 数组，需要重新赋值给 DataFrame\n",
    "X_train[continuous_features] = scaler.fit_transform(X_train[continuous_features])\n",
    "\n",
    "# 使用训练集学到的参数 (scaler) 直接对测试集进行 transform\n",
    "# 绝对不能对测试集使用 fit_transform()\n",
    "X_test[continuous_features] = scaler.transform(X_test[continuous_features])\n",
    "X_test\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8b96a197",
   "metadata": {},
   "source": [
    "## 2.3 模型训练与评估\n",
    "三行经典代码\n",
    "1. 模型实例化\n",
    "2. 模型训练（代入训练集）\n",
    "3. 模型预测 （代入测试集）\n",
    "\n",
    "测试集的预测值和测试集的真实值进行对比，得到混淆矩阵\n",
    "\n",
    "- 基于混淆矩阵，计算准确率、召回率、F1值，这些都是固定阈值的评估指标\n",
    "\n",
    "- AUC是基于不同阈值得到不同的混淆矩阵，然后计算每个阈值对应FPR和TPR，讲这些点连成线，最后求曲线下的面积，得到AUC值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 279,
   "id": "0885a3e9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# #安装xgboost库\n",
    "# !pip install xgboost -i https://pypi.tuna.tsinghua.edu.cn/simple/ \n",
    "# #安装lightgbm库 \n",
    "# !pip install lightgbm  -i https://pypi.tuna.tsinghua.edu.cn/simple/ \n",
    "# #安装catboost库\n",
    "# !pip install catboost -i https://pypi.tuna.tsinghua.edu.cn/simple/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 280,
   "id": "275e986f",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.svm import SVC #支持向量机分类器\n",
    "from sklearn.neighbors import KNeighborsClassifier #K近邻分类器\n",
    "from sklearn.linear_model import LogisticRegression #逻辑回归分类器\n",
    "import xgboost as xgb #XGBoost分类器\n",
    "import lightgbm as lgb #LightGBM分类器\n",
    "from sklearn.ensemble import RandomForestClassifier #随机森林分类器\n",
    "from catboost import CatBoostClassifier #CatBoost分类器\n",
    "from sklearn.tree import DecisionTreeClassifier #决策树分类器\n",
    "from sklearn.naive_bayes import GaussianNB #高斯朴素贝叶斯分类器\n",
    "from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score # 用于评估分类器性能的指标\n",
    "from sklearn.metrics import classification_report, confusion_matrix #用于生成分类报告和混淆矩阵\n",
    "import warnings #用于忽略警告信息\n",
    "warnings.filterwarnings(\"ignore\") # 忽略所有警告信息\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 281,
   "id": "bfb215b3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "SVM 分类报告：\n",
      "              precision    recall  f1-score   support\n",
      "\n",
      "           0       0.76      1.00      0.86       997\n",
      "           1       0.96      0.20      0.33       400\n",
      "\n",
      "    accuracy                           0.77      1397\n",
      "   macro avg       0.86      0.60      0.60      1397\n",
      "weighted avg       0.82      0.77      0.71      1397\n",
      "\n",
      "SVM 混淆矩阵：\n",
      "[[994   3]\n",
      " [320  80]]\n",
      "SVM 模型评估指标：\n",
      "准确率: 0.7688\n",
      "精确率: 0.9639\n",
      "召回率: 0.2000\n",
      "F1 值: 0.3313\n"
     ]
    }
   ],
   "source": [
    "# SVM\n",
    "svm_model = SVC(random_state=42)\n",
    "svm_model.fit(X_train, y_train)\n",
    "svm_pred = svm_model.predict(X_test)\n",
    "\n",
    "print(\"\\nSVM 分类报告：\")\n",
    "print(classification_report(y_test, svm_pred))  # 打印分类报告\n",
    "print(\"SVM 混淆矩阵：\")\n",
    "print(confusion_matrix(y_test, svm_pred))  # 打印混淆矩阵\n",
    "\n",
    "# 计算 SVM 评估指标，这些指标默认计算正类的性能\n",
    "svm_accuracy = accuracy_score(y_test, svm_pred)\n",
    "svm_precision = precision_score(y_test, svm_pred)\n",
    "svm_recall = recall_score(y_test, svm_pred)\n",
    "svm_f1 = f1_score(y_test, svm_pred)\n",
    "print(\"SVM 模型评估指标：\")\n",
    "print(f\"准确率: {svm_accuracy:.4f}\")\n",
    "print(f\"精确率: {svm_precision:.4f}\")\n",
    "print(f\"召回率: {svm_recall:.4f}\")\n",
    "print(f\"F1 值: {svm_f1:.4f}\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "54c99822",
   "metadata": {},
   "source": [
    "classification_report它会生成所有类别的指标\n",
    "\n",
    "准确率（Accuracy）是一个全局指标，衡量所有类别预测正确的比例 (TP + TN) / (TP + TN + FP + FN)。它不区分正负类，所以它只有一个值，不区分类别\n",
    "\n",
    "单独调用的 precision_score, recall_score, f1_score 在二分类中默认只计算正类（标签 1）的性能。由于模型从未成功预测出类别 1（TP=0），所以这些指标对类别 1 来说都是 0。\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 282,
   "id": "5c6caa6e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "KNN 分类报告：\n",
      "              precision    recall  f1-score   support\n",
      "\n",
      "           0       0.78      0.89      0.83       997\n",
      "           1       0.56      0.36      0.44       400\n",
      "\n",
      "    accuracy                           0.74      1397\n",
      "   macro avg       0.67      0.63      0.63      1397\n",
      "weighted avg       0.72      0.74      0.72      1397\n",
      "\n",
      "KNN 混淆矩阵：\n",
      "[[883 114]\n",
      " [254 146]]\n",
      "KNN 模型评估指标：\n",
      "准确率: 0.7366\n",
      "精确率: 0.5615\n",
      "召回率: 0.3650\n",
      "F1 值: 0.4424\n"
     ]
    }
   ],
   "source": [
    "# KNN\n",
    "knn_model = KNeighborsClassifier()\n",
    "knn_model.fit(X_train, y_train)\n",
    "knn_pred = knn_model.predict(X_test)\n",
    "\n",
    "print(\"\\nKNN 分类报告：\")\n",
    "print(classification_report(y_test, knn_pred))\n",
    "print(\"KNN 混淆矩阵：\")\n",
    "print(confusion_matrix(y_test, knn_pred))\n",
    "\n",
    "knn_accuracy = accuracy_score(y_test, knn_pred)\n",
    "knn_precision = precision_score(y_test, knn_pred)\n",
    "knn_recall = recall_score(y_test, knn_pred)\n",
    "knn_f1 = f1_score(y_test, knn_pred)\n",
    "print(\"KNN 模型评估指标：\")\n",
    "print(f\"准确率: {knn_accuracy:.4f}\")\n",
    "print(f\"精确率: {knn_precision:.4f}\")\n",
    "print(f\"召回率: {knn_recall:.4f}\")\n",
    "print(f\"F1 值: {knn_f1:.4f}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 283,
   "id": "72beab75",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "逻辑回归 分类报告：\n",
      "              precision    recall  f1-score   support\n",
      "\n",
      "           0       0.77      0.99      0.87       997\n",
      "           1       0.89      0.28      0.42       400\n",
      "\n",
      "    accuracy                           0.78      1397\n",
      "   macro avg       0.83      0.63      0.64      1397\n",
      "weighted avg       0.81      0.78      0.74      1397\n",
      "\n",
      "逻辑回归 混淆矩阵：\n",
      "[[984  13]\n",
      " [290 110]]\n",
      "逻辑回归 模型评估指标：\n",
      "准确率: 0.7831\n",
      "精确率: 0.8943\n",
      "召回率: 0.2750\n",
      "F1 值: 0.4207\n"
     ]
    }
   ],
   "source": [
    "# 逻辑回归\n",
    "logreg_model = LogisticRegression(random_state=42)\n",
    "logreg_model.fit(X_train, y_train)\n",
    "logreg_pred = logreg_model.predict(X_test)\n",
    "\n",
    "print(\"\\n逻辑回归 分类报告：\")\n",
    "print(classification_report(y_test, logreg_pred))\n",
    "print(\"逻辑回归 混淆矩阵：\")\n",
    "print(confusion_matrix(y_test, logreg_pred))\n",
    "\n",
    "logreg_accuracy = accuracy_score(y_test, logreg_pred)\n",
    "logreg_precision = precision_score(y_test, logreg_pred)\n",
    "logreg_recall = recall_score(y_test, logreg_pred)\n",
    "logreg_f1 = f1_score(y_test, logreg_pred)\n",
    "print(\"逻辑回归 模型评估指标：\")\n",
    "print(f\"准确率: {logreg_accuracy:.4f}\")\n",
    "print(f\"精确率: {logreg_precision:.4f}\")\n",
    "print(f\"召回率: {logreg_recall:.4f}\")\n",
    "print(f\"F1 值: {logreg_f1:.4f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "66577e89",
   "metadata": {},
   "source": [
    "我们来解读一下这个输出的表格，看看能看出来哪些信息？\n",
    "\n",
    "1. Precision (精确率)\t在所有模型预测为该类别的样本中，真正属于该类别的比例。\n",
    "2. Recall (召回率)\t在所有真正属于该类别的样本中，被模型正确识别的比例。\n",
    "\n",
    "因此，分类报告必须给出 $0$ 和 $1$ 的详细指标，以便了解模型在预测两种不同结果时的偏向和能力差异。\n",
    "\n",
    "准确率 (Accuracy)： 这是整体指标，计算的是 $(TP + TN) / Total$，与 $0$ 或 $1$ 无关，所以只有一个总值。\n",
    "\n",
    "\n",
    "在二分类问题中，Scikit-learn 的评估函数（如 precision_score, recall_score, f1_score）在默认情况下，会将标签 $1$ 视为重点关注的正类来计算指标。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b014a75b",
   "metadata": {},
   "source": [
    "此外，support（样本数）显示，类别 $0$ 有 $997$ 个样本，而类别 $1$ 只有 $400$ 个样本。这是一个不平衡数据集。逻辑回归模型严重偏向于预测类别 $0$（未违约），模型在预测 $1$（违约）时很“谨慎”（高 Precision: $0.89$），但它错过了大量真正的违约者（低 Recall: $0.28$）。\n",
    "\n",
    "1. 精确率关注的是“误报”（False Positive）——即把非违约客户错判为违约客户的错误。预测为违约的客户中有 $89.43\\%$ 是正确的。误报率低。\n",
    "2. 召回率关注的是“漏报”（False Negative）——即把真正违约的客户错判为未违约客户的错误。模型只识别出了所有真正违约客户中的 $27.50\\%$。漏报率极高。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 284,
   "id": "287d737b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "朴素贝叶斯 分类报告：\n",
      "              precision    recall  f1-score   support\n",
      "\n",
      "           0       0.96      0.21      0.35       997\n",
      "           1       0.33      0.98      0.50       400\n",
      "\n",
      "    accuracy                           0.43      1397\n",
      "   macro avg       0.65      0.60      0.42      1397\n",
      "weighted avg       0.78      0.43      0.39      1397\n",
      "\n",
      "朴素贝叶斯 混淆矩阵：\n",
      "[[213 784]\n",
      " [  9 391]]\n",
      "朴素贝叶斯 模型评估指标：\n",
      "准确率: 0.4324\n",
      "精确率: 0.3328\n",
      "召回率: 0.9775\n",
      "F1 值: 0.4965\n"
     ]
    }
   ],
   "source": [
    "# 朴素贝叶斯\n",
    "nb_model = GaussianNB()\n",
    "nb_model.fit(X_train, y_train)\n",
    "nb_pred = nb_model.predict(X_test)\n",
    "\n",
    "print(\"\\n朴素贝叶斯 分类报告：\")\n",
    "print(classification_report(y_test, nb_pred))\n",
    "print(\"朴素贝叶斯 混淆矩阵：\")\n",
    "print(confusion_matrix(y_test, nb_pred))\n",
    "\n",
    "nb_accuracy = accuracy_score(y_test, nb_pred)\n",
    "nb_precision = precision_score(y_test, nb_pred)\n",
    "nb_recall = recall_score(y_test, nb_pred)\n",
    "nb_f1 = f1_score(y_test, nb_pred)\n",
    "print(\"朴素贝叶斯 模型评估指标：\")\n",
    "print(f\"准确率: {nb_accuracy:.4f}\")\n",
    "print(f\"精确率: {nb_precision:.4f}\")\n",
    "print(f\"召回率: {nb_recall:.4f}\")\n",
    "print(f\"F1 值: {nb_f1:.4f}\")\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 285,
   "id": "c97db71d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "决策树 分类报告：\n",
      "              precision    recall  f1-score   support\n",
      "\n",
      "           0       0.77      0.76      0.77       997\n",
      "           1       0.43      0.45      0.44       400\n",
      "\n",
      "    accuracy                           0.67      1397\n",
      "   macro avg       0.60      0.60      0.60      1397\n",
      "weighted avg       0.68      0.67      0.67      1397\n",
      "\n",
      "决策树 混淆矩阵：\n",
      "[[761 236]\n",
      " [222 178]]\n",
      "决策树 模型评估指标：\n",
      "准确率: 0.6722\n",
      "精确率: 0.4300\n",
      "召回率: 0.4450\n",
      "F1 值: 0.4373\n"
     ]
    }
   ],
   "source": [
    "# 决策树\n",
    "dt_model = DecisionTreeClassifier(random_state=42)\n",
    "dt_model.fit(X_train, y_train)\n",
    "dt_pred = dt_model.predict(X_test)\n",
    "\n",
    "print(\"\\n决策树 分类报告：\")\n",
    "print(classification_report(y_test, dt_pred))\n",
    "print(\"决策树 混淆矩阵：\")\n",
    "print(confusion_matrix(y_test, dt_pred))\n",
    "\n",
    "dt_accuracy = accuracy_score(y_test, dt_pred)\n",
    "dt_precision = precision_score(y_test, dt_pred)\n",
    "dt_recall = recall_score(y_test, dt_pred)\n",
    "dt_f1 = f1_score(y_test, dt_pred)\n",
    "print(\"决策树 模型评估指标：\")\n",
    "print(f\"准确率: {dt_accuracy:.4f}\")\n",
    "print(f\"精确率: {dt_precision:.4f}\")\n",
    "print(f\"召回率: {dt_recall:.4f}\")\n",
    "print(f\"F1 值: {dt_f1:.4f}\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 286,
   "id": "1164cd92",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "随机森林 分类报告：\n",
      "              precision    recall  f1-score   support\n",
      "\n",
      "           0       0.78      0.96      0.86       997\n",
      "           1       0.77      0.31      0.44       400\n",
      "\n",
      "    accuracy                           0.78      1397\n",
      "   macro avg       0.77      0.64      0.65      1397\n",
      "weighted avg       0.78      0.78      0.74      1397\n",
      "\n",
      "随机森林 混淆矩阵：\n",
      "[[960  37]\n",
      " [275 125]]\n",
      "随机森林 模型评估指标：\n",
      "准确率: 0.7767\n",
      "精确率: 0.7716\n",
      "召回率: 0.3125\n",
      "F1 值: 0.4448\n"
     ]
    }
   ],
   "source": [
    "# 随机森林\n",
    "rf_model = RandomForestClassifier(random_state=42)\n",
    "rf_model.fit(X_train, y_train)\n",
    "rf_pred = rf_model.predict(X_test)\n",
    "\n",
    "print(\"\\n随机森林 分类报告：\")\n",
    "print(classification_report(y_test, rf_pred))\n",
    "print(\"随机森林 混淆矩阵：\")\n",
    "print(confusion_matrix(y_test, rf_pred))\n",
    "\n",
    "rf_accuracy = accuracy_score(y_test, rf_pred)\n",
    "rf_precision = precision_score(y_test, rf_pred)\n",
    "rf_recall = recall_score(y_test, rf_pred)\n",
    "rf_f1 = f1_score(y_test, rf_pred)\n",
    "print(\"随机森林 模型评估指标：\")\n",
    "print(f\"准确率: {rf_accuracy:.4f}\")\n",
    "print(f\"精确率: {rf_precision:.4f}\")\n",
    "print(f\"召回率: {rf_recall:.4f}\")\n",
    "print(f\"F1 值: {rf_f1:.4f}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 287,
   "id": "44c59c97",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "XGBoost 分类报告：\n",
      "              precision    recall  f1-score   support\n",
      "\n",
      "           0       0.79      0.92      0.85       997\n",
      "           1       0.66      0.38      0.48       400\n",
      "\n",
      "    accuracy                           0.77      1397\n",
      "   macro avg       0.72      0.65      0.66      1397\n",
      "weighted avg       0.75      0.77      0.74      1397\n",
      "\n",
      "XGBoost 混淆矩阵：\n",
      "[[917  80]\n",
      " [248 152]]\n",
      "XGBoost 模型评估指标：\n",
      "准确率: 0.7652\n",
      "精确率: 0.6552\n",
      "召回率: 0.3800\n",
      "F1 值: 0.4810\n"
     ]
    }
   ],
   "source": [
    "# XGBoost\n",
    "xgb_model = xgb.XGBClassifier(random_state=42)\n",
    "xgb_model.fit(X_train, y_train)\n",
    "xgb_pred = xgb_model.predict(X_test)\n",
    "\n",
    "print(\"\\nXGBoost 分类报告：\")\n",
    "print(classification_report(y_test, xgb_pred))\n",
    "print(\"XGBoost 混淆矩阵：\")\n",
    "print(confusion_matrix(y_test, xgb_pred))\n",
    "\n",
    "xgb_accuracy = accuracy_score(y_test, xgb_pred)\n",
    "xgb_precision = precision_score(y_test, xgb_pred)\n",
    "xgb_recall = recall_score(y_test, xgb_pred)\n",
    "xgb_f1 = f1_score(y_test, xgb_pred)\n",
    "print(\"XGBoost 模型评估指标：\")\n",
    "print(f\"准确率: {xgb_accuracy:.4f}\")\n",
    "print(f\"精确率: {xgb_precision:.4f}\")\n",
    "print(f\"召回率: {xgb_recall:.4f}\")\n",
    "print(f\"F1 值: {xgb_f1:.4f}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 288,
   "id": "b4242953",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[LightGBM] [Info] Number of positive: 1630, number of negative: 3957\n",
      "[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000489 seconds.\n",
      "You can set `force_col_wise=true` to remove the overhead.\n",
      "[LightGBM] [Info] Total Bins 1888\n",
      "[LightGBM] [Info] Number of data points in the train set: 5587, number of used features: 16\n",
      "[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.291749 -> initscore=-0.886906\n",
      "[LightGBM] [Info] Start training from score -0.886906\n",
      "\n",
      "LightGBM 分类报告：\n",
      "              precision    recall  f1-score   support\n",
      "\n",
      "           0       0.78      0.93      0.85       997\n",
      "           1       0.68      0.36      0.47       400\n",
      "\n",
      "    accuracy                           0.77      1397\n",
      "   macro avg       0.73      0.65      0.66      1397\n",
      "weighted avg       0.75      0.77      0.74      1397\n",
      "\n",
      "LightGBM 混淆矩阵：\n",
      "[[928  69]\n",
      " [256 144]]\n",
      "LightGBM 模型评估指标：\n",
      "准确率: 0.7674\n",
      "精确率: 0.6761\n",
      "召回率: 0.3600\n",
      "F1 值: 0.4698\n"
     ]
    }
   ],
   "source": [
    "# LightGBM\n",
    "lgb_model = lgb.LGBMClassifier(random_state=42)\n",
    "lgb_model.fit(X_train, y_train)\n",
    "lgb_pred = lgb_model.predict(X_test)\n",
    "\n",
    "print(\"\\nLightGBM 分类报告：\")\n",
    "print(classification_report(y_test, lgb_pred))\n",
    "print(\"LightGBM 混淆矩阵：\")\n",
    "print(confusion_matrix(y_test, lgb_pred))\n",
    "\n",
    "lgb_accuracy = accuracy_score(y_test, lgb_pred)\n",
    "lgb_precision = precision_score(y_test, lgb_pred)\n",
    "lgb_recall = recall_score(y_test, lgb_pred)\n",
    "lgb_f1 = f1_score(y_test, lgb_pred)\n",
    "print(\"LightGBM 模型评估指标：\")\n",
    "print(f\"准确率: {lgb_accuracy:.4f}\")\n",
    "print(f\"精确率: {lgb_precision:.4f}\")\n",
    "print(f\"召回率: {lgb_recall:.4f}\")\n",
    "print(f\"F1 值: {lgb_f1:.4f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13f1e071",
   "metadata": {},
   "source": [
    "| 模型名称 | 准确率 | 精确率（正类） | 召回率（正类） | F1值（正类） | 精确率（负类） | 召回率（负类） | F1值（负类） |\n",
    "| ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |\n",
    "| SVM | 0.7060 | 0.0000 | 0.0000 | 0.0000 | 0.71 | 1.00 | 0.83 |\n",
    "| KNN | 0.6753 | 0.4102 | 0.2381 | 0.3013 | 0.73 | 0.86 | 0.79 |\n",
    "| 逻辑回归 | 0.7560 | 0.8571 | 0.2041 | 0.3297 | 0.75 | 0.99 | 0.85 |\n",
    "| 朴素贝叶斯 | 0.4267 | 0.3377 | 0.9887 | 0.5035 | 0.98 | 0.19 | 0.32 |\n",
    "| 决策树 | 0.6773 | 0.4564 | 0.5102 | 0.4818 | 0.79 | 0.75 | 0.77 |\n",
    "| 随机森林 | 0.7700 | 0.7857 | 0.2993 | 0.4335 | 0.77 | 0.97 | 0.86 |\n",
    "| XGBoost | 0.7473 | 0.6192 | 0.3651 | 0.4593 | 0.77 | 0.91 | 0.84 |\n",
    "| LightGBM | 0.7660 | 0.7009 | 0.3560 | 0.4722 | 0.78 | 0.94 | 0.85 | "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "440ed2bd",
   "metadata": {},
   "source": [
    "对于指标怎么看呢？\n",
    "1. 一般文章都是会选择一个作为主指标，比如F1分数，然后筛选出最好的模型\n",
    "2. 一般最好的指标对应的模型，其他的指标也都很好\n",
    "3. 数学建模中有评价任务，比如根据熵权法进行判断。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "vs",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
